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ABSTRACT 


The  technique  of  formal  abstraction  provides  an 
appropriate  tool  for  specifying  an  interface  between  layers 
of  computer  hardware  and  software.  An  abstract  machine  called 
AM  has  been  built  to  address  the  problem  of  portability  and 
reusability  of  software.  This  thesis  is  the  design  and 
implementation  of  a  "C"  compiler  for  this  abstract  machine. 
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I.  INTRODUCTION 


In  today’s  computer  world,  portability  is  a  well-known 
problem  which  arises  in  a  variety  of  situations.  Since 
computer  software  evolves  in  connection  with  a  particular 
hardware  environment,  and  often  assumes  features  closely 
related  to  characterist ics  of  its  own  hardware,  this  problem 
has  been  unavoidable. 

Formalizing  the  relationship  between  hardware  and 
software  resources  was  treated  is  a  previous  NPS  thesis  by 
Yurchak  [Ref.  13,  whose  efforts  resulted  in  the  specification 
and  implementation  of  an  abstract  machine,  called  AM. 

The  abstraction  of  a  bit  mapped  display  resource  was 
added  to  AM  in  another  NPS  thesis  by  Hunter.  [Ref.  S3 

Finally,  an  abstraction  of  a  formally  specified  reusable 
database  was  added  to  the  same  machine  by  Zang.  [Ref.  33 

This  presentation  is  a  further  extension  of  the  work 
started  by  Yurchak  and  Hunter:  An  abstract  computer  and  its 
programming  environment.  Its  major  objective  is  a  compiler 
for  a  subset  of  the  C  language  for  AM. 

A.  THE  PORTABILITY  PROBLEM 

It  is  well-known  that  moving  large  programs  from  one 
machine  to  another  is  frustrating  work.  And  it  is  also  known 
that  once  the  software  has  been  moved  to  the  new  machine,  it 
is  not  predictable  whether  or  not  it  will  work  as  before. 


& 


Even  if  it  seems  to  work,  it  may  consume  more  resources  than 
expected. 

For  a  couple  of  reasons,  the  portability  problem  is 
getting  worse,  not  better; 

-  Computer  architectures  have  been  changed  to  make  them 
look  like  what  the  programmer  wants 

-  The  number  of  the  devices  included  in  modern 

architectures  has  been  maximised 

-  Both  languages  and  machines  are  related  to  the  data  they 
manipulate  in  an  implementation  dependent  way 

These  and  other  factors  make  the  portability  problem  a 

difficult  task,  and  in  addition,  they  affect  some  other 

difficult  issues  like  language  design  and  software 

engineering. 

B.  CURRENT  IMPLEMENTATIONS  TO  SOLVE  PORTABILITY  PROBLEM 

The  usage  of  high  level  languages  provides  a  degree  of 
high  level  abstraction,  and  provides  some  measure  of  software 
standardization  and  portability.  But  the  portability  of  high 
level  languages  is  limited,  since  all  the  layers  of  software 
below  this  high  level  have  to  be  moved,  in  order  to  port  such 
a  system. 

There  are  other  abstraction  levels  between  the  computer 
hardware  and  the  application  environments.  Especially 
operating  systems  represent  a  software  abstraction  of 
physical  resources,  and  support  the  layers  of  software  built 
over  this  level.  Starting  with  CP/M  and  UNIX,  we  have  seen 
some  good  implementat ions  that  provide  such  an  abstract 
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level  to  some  degree 


The  main  idea  of  the  AM  machine  is  to 
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abstract  and  formally  define  other  physical  resources  found 
in  typical  computing  systems. 


[i# 


«<■ 


The  Abstract  Machine  (AM)  is  a  result  of  Yurchak  CRef.  11 


and  Hunter’ s  CRef.  £1  efforts  to  -  solve  the  problem  of 
formalizing  the  relationship  between  hardware  and  software 
resources.  It  is  implemented  as  a  finite  state  machine 
interpreter,  with  an  assembler.  Details  of  the  newest  version 
of  the  AM  assembler  can  be  found  in  Zang’s  CRef.  31  thesis. 

"Abstract  ion"  describes  the  separation  of  the  defining 
properties  of  an  object  from  other,  unnecessary  details  about 
it.  A  programmer  is  primarily  concerned  with  solving  a 
problem.  Appropr iately,  the  tools  at  his  disposal,  such  as 
programming  languages,  development  aids,  and  the  programming 
environment ,  form  a  problem  solving  abstracti  :>n.  The  hardware 
(and  some  of  the  software)  on  which  this  problem  solving 
abstraction  is  implemented,  however,  is  an  abstraction  of  a 
different  sort. 

The  fuzzy  area  between  software  and  physical  resource 
abstractions,  sometimes  simpl ist ical ly  perceived  as  the 
boundary  between  hardware  and  software,  exposes  a  number  of 
shortcomings  in  language  design  and  computer  architecture 
collectively  termed  the  "semantic  gap". 

Narrowing  the  semantic  gap  requires  significant  changes 
in  the  fundamentals  of  computer  architecture  and  language 
design.  Three  major  factors  which  significantly  contribute  to 
this  problem  are: 
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-  Informally  described  semantics; 

-  Represent  at i on  dependent  data  types; 

-  Arbitrarily  designed  instruction  set  architectures. 

The  AM  was  designed  to  fill  this  semantic  gap  by 
addressing  the  above  problems.  [Ref.  13. 

In  the  AM  implementation,  a  text  file  representing  an 
assembly  language  program  is  translated  by  the  assembler  into 
a  relocatable  object  module.  A  loader,  part  of  the  AM 
interpreter,  loads  this  object  module  into  the  appropriate 
cells,  and  AM  executes  it. 

The  following  presentation  is  an  implementation  of  a 
subset  of  the  high  level  language  "C",  for  that  abstract 
machine.  It  is  a  compiler  which  compiles  C  source  code  and 
generates  assembler  source  code  for  the  AM. 


III.  DISCUSSION  OF  "C"  SUBSET 


Since  commercially  good  compilers  are  very  large  programs 
and  it  takes  on  the  average  six  man-years  to  write  one  of 
them,  this  research  work  had  to  be  a  small  subset  of  the  C 
language. 

The  goal  was  to  write  a  small  portion  of  C  in  the  C 
language  itself,  and  then  by  feeding  the  output  of  this  work 
into  itself,  to  create  a  native  code  C  compiler. 

Since  this  work  was  going  to  be  a  race  against  time,  the 
subset  had  to  be  as  small  as  possible,  but  on  the  other  side, 
had  to  be  large  enough  to  be  able  to  compile  its  own  source 
code. 

The  sub-goal  was  to  use  a  strictly  limited  number  of 
features  to  write  the  compiler,  because  any  new  feature  used 
in  the  code  would  require  implementation  of  the  same  feature 
in  the  compiler. 

The  outcome  of  this  work  was  not  sophisticated  enough  to 
compile  itself.  It  evolved  as  a  small  subset  of  the  C 
programming  language,  so  called  "Tiny-C".  find  since  it  was 
not  sufficient  to  compile  its  own  code,  it  is  used  as  a 
cross-compiler  from  host  MS-DOS  computers  to  the  target 
machine  fiM. 

fi.  TINY-C  SUBSET 

Tiny-C  is  a  small  subset  of  C,  and  a  thesis  project 
more  than  a  language.  There  are  many  features  which  a  real 
programming  language  has  to  have,  but  Tiny-C  does  not. 


The  Tiny-C  compiler  was  written  in  five  months  and  is 
considered  to  have  the  fundamental  structure  of  a  real 
compiler.  Hopefully  it  will  be  modified  and  improved  in  the 
future,  and  may  be  usable  for  real  applications. 

Appendix  A  is  a  listing  of  the  Tiny-C  language  grammar. 
But  this  grammar  is  obviously  not  the  complete  "C" 
language.  At  least: 

-  Structure  and  union  specifiers  are  not  included. 

-  Functions  are  not  allowed  to  return  addresses. 

-  Assignments  inside  the  expressions  are  not  allowed, 
because  they  were  considered  as  making  programs 
"unreadable".  For  instance: 

"if  <  <Jo«"  Jimmy+5)  >  S  >"  is  not  allowed  in  Tiny-C. 

-  Multiple  assignments  are  not  implemented.  For  instance: 
"Joe*  Jimmy  "IS  *  maryf"  is  an  invalid  statement  in 

T i ny— C. 

B.  THE  TINY-C  COMPILER 

Even  though  the  Tiny-C  language  subset  was  planned  within 
the  limits  in  this  thesis,  the  Tiny-C  compiler  can  only 
compile  and  generate  code  for  an  even  smaller  subset  of  the 
above  grammar. 

The  Tiny-C  compiler  implementation  can  parse  the  whole 
Tiny-C  subset  and  give  proper  error  messages  if  necassary. 
But , 

-  due  to  the  time  constraints,  and 

-  due  to  the  restricted  capabilities  of  the  target  AM 
machine 
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the  Tiny-C  compiler  cannot  generate  code  for  the  whole  Tiny-C 
language. 

In  the  Tiny-C  compiler; 

-  Floating  point  arithmetic  is  not  implemented.  Because  it 
is  not  supported  by  AM. 

-  Bitwise  and  shift  expressions  are  not  implemented,  since 
they  are  not  supported  by  AM. 

-  Since  AM  has  strictly  defined  data  types  and  does  not 
allow  type  conversions,  address,  pointer  and  array  types 
are  not  implemented. 

-  Since  AM  is  designed  as  an  operating  system  independent 
software  machine,  the  "ttinclude"  preprecessor  is  not 
implemented. 

-  Since  AM  does  not  have  a  linker  yet,  external 

declarations  are  not  implemented. 

-  Auto,  static,  register,  boolean  types  are  not 

implemented. 
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This  chapter  describes  the  Tiny-C  compiler  step  by  step. 
But  obviously  the  purpose  of  this  present at  ion  is  not  to 
teach  the  "compiler  writing  art",  or  to  explain  the  target 
Abstract  Machine’s  assembler.  Complete  documentation  for  the 
AM  Assembler  can  be  found  in  Yurchak’ s  thesis  [Ref.  11.  For  a 
better  understanding  of  the  following  structures,  Ullmann’s 
"Compilers,  Techniques  and  Tools"  CRef.  41  is  recommended  as 
a  background  reference  for  compiler  writing. 

The  Tiny-C  Compiler  is  written  in  nine  steps.  These  are: 

-  Scanner  or  Lexical  Analyzer 

-  Grammar 

-  Recursive  Descent  Parser  with  Backtracking 

-  Data  Structures  for  the  Parser 

-  Error  Checking  and  Error  Messages 

-  Emission  of  Intermediate  Code 

-  Intermediate  Code  Optimization 

-  Data  Structures  for  the  Code  Generator 

-  Target  Code  Generation 

Ule  will  first  go  through  these  steps  briefly  in  order  to 
get  acquainted  with  the  architecture  of  the  Tiny-C  compiler. 

A.  SCANNER  AND  LEXICAL  ANALYZER 

In  general,  scanners  and  lexical  analyzers  are  language 
independent  structures.  The  same  scanner  may  be  used  for  a 
couple  of  different  compilers.  For  this  reason  we  will 


introduce  this  structure  even  before  discussing  the  Tiny-C 
grammar. 

Contrary  to  the  header  of  this  section,  Tiny-C  does  not 
have  a  scanner  or  lexical  analyzer  in  the  classical  sense. 

Even  though  the  most  common  way  of  writing  compilers  is 
analyzing  the  input  data  stream  lexically,  and  after 
cokenizing,  passing  tokens  to  the  parser  as  they  are  needed, 
this  was  not  the  way  scanning  was  implemented  in  this 
compiler.  The  Tiny-C  scanner  is  made  up  of  a  couple  of 
routines  used  by  a  recursive  descent  parser  with  a 
backtracking  tool.  There  is  no  tokenized  data  stream. 

The  idea  is  to  read  the  input  stream  into  a  scanner 
buffer,  (which  is  implemented  as  a  ring  buffer)  and  parse 
it  there.  This  technique  gives  an  ability  to  backtrack  and 
makes  it  possible  to  write  a  very  simple  recursive  descent 
top-down  parser.  With  such  a  backtracking  tool,  the  grammar 
does  not  need  to  be  massaged  to  a  fully  LL ( 1 )  grammar,  that 
is  even  if  it  is  ambiguous  in  the  LL  ( 1 )  sense.  In  any 
ambiguous  case,  the  parser  can  try  all  possible  options  by 
bactracking. 

Let’s  start  by  introducing  our  scanner  buffer 

and  its  initialization. 

init_buf<)  /*  initialize  scanner  buffer  */ 

< 

Reads  input  source  file  into  scanner  buffer.  Sets  the 
pointers  for  the  current  place  (for  initializing  procedure, 
it  is  simply  the  begining  of  the  scanner  buffer)  and  for  the 
very  last  character  in  the  scanner  buffer. 


The  scanner  may  or  may  not  read  the  whole  input  stream 
at  once,  because  its  ring  buffer  has  a  limited  size.  Now  the 
next  question  is  how  to  get  a  character  from  this  buffer 
(since  tokens  are  not  used,  we  have  to  deal  with 
characters),  and  if  it  is  the  end  of  the  characters,  how  to 
read  some  more  input  into  this  ring  buffer. 


char  getchr  < )  /*  get  character  routine  */ 

< 

Gets  the  next  character  from  scanner  buffer,  and  loads 
it  into  global  "nextch".  If  it  reaches  the  current  end  of 
the  scanner  buffer,  it  reads  some  more  text  from  the  source 
into  scanner  buffer.  If  it  meets  the  end  of  file  character, 
it  sets  the  "file_end"  flag  TRUE. 

> 

We  even  can  put  a  character  back  into  the  buffer,  if 
needed. 


unfletchrO  /*  un-get  character  */ 

-C 

Puts  a  given  character  back  into  scanner  buffer. 

> 


After  initializing  the  scanner  buffer,  we  can  get  as 
many  characters  from  there  as  we  want  to.  But  parsers  are 
higher  level  concepts,  and  they  shouldn’t  deal  with  the  low 
level  structures  of  scanning  like  getting  three  more 
characters  or  putting  back  one.  Parsers  mostly  work  on 
tokens.  I f  we  had  a  pure  tokenized  implementation,  we  could 
simply  pop  a  token  number  from  the  scanner  buffer.  But  here 
we  need  something  to  give  tokens  to  the  parser.  Also  white 
characters  and  comments  should  be  ignored. 


String  tokens 


are  given  to  the  parser  by  the  following 
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rout  ine 


m*tchtok»n <«tr, whtchk)  /*  match  to  a  given  string  token  */ 

char  strCl,  /*  string  token  */ 

whtchk;  /*  boolean  variable  for  white  chr.  check  */ 

< 

This  routine  attemps  to  read  the  string  token  from 
scanner  buffer.  A  following  white  character  or  delimiter  is 
optional,  and  this  decision. is  made  by  the  caller,  namely 
parser.  It  returns  TRUE,  if  the  token  matches  (and  a  white 
character,  optionally),  else  returns  FALSE.  In  case  of 
FALSE,  it  backtracks  in  the  scanner  buffer  to  its  previous 
place. 

> 

The  following  routine  attempts  to  match  a  single  cha¬ 
racter  in  the  scanner  buffer  and  returns  a  boolean  result. 


match <chr)  /*  match  to  a  single  character 

char  chr;  /*  character  to  match 
{ 

del wht ( ) ; 

/*  if  character  matches,  return  TRUE  */ 
if  (nextch=*chr) 

nextch=getchr() ; 
return (TRUE)  ; 

> 

Y'etuv'n  (FALSE)  ; 

> 


Both  of  these  routines  delete  white  characters  first. 
And  in  case  of  FALSE,  they  do  not  backtrack  to  their 
previous  places  exactly,  otherwise  the  following  routines 
have  to  skip  white  characters  one  more  time.  So,  in  the  case 
of  the  FALSE  or  "unmatched"  case,  they  backtrack  to  the  very 
first  character  which  comes  just  after  the  white  ones. 


delwhtO  /*  delete  white  characters  */ 

< 

Used  by  both  match-character  and  match-token  routines  and 
skips  all  the  following  white  characters  (blank,  tab, 
carriage  return  and  line  feed  characters)  and  the  comments  in 
the  scanner  buffer. 

> 

B.  GRAMMAR 

Since  there  is  not  a  standard  C  language  grammar,  we  had 
to  first  write  a  grammar  to  parse.  The  Tiny-C  subset  was 
discussed  in  the  previous  chapter,  and  its  complete  grammar 
is  presented  in  Appendix  A. 

In  this  grammar  (Appendix  A),  any  terminal  or  non¬ 
terminal  followed  by  a  ’ character  means  "none  or  more, " 
followed  by  a  "+"  character  means  "one  or  more, "  and 
followed  by  a  "?"  character  means  "optional"  or  "none  or 
one."  Under  these  definitions  for  example: 
program: 

<pre-prece»eor> #  <data-def init ion) *  <funct ion-def init ion> + 

The  non-terminal  (program)  goes  to  any  number  of  (pre¬ 
processor)  ,  followed  by  any  number  of  (data-def init ion)  and 
followed  by  one  or  more  (function-definition). 

The  ’ I’  character  means  "or".  For  example: 
pre-preceeeor  t 

" #def ine"  <f i le-def init ion)  I 

"#include"  (file-definition) 

Thus,  (pre-precessor)  goes  to  "#define"  followed 

by  (file-definition),  or,  "# include"  followed  by  (file- 
def init ion) . 

The  1 !’  character  means  "allowed  at  most  once."  For 


"■witch"  "<"  <«rithm»t ic-»xpr««»ion>  ")"  "<" 

"ca««"  I  "default"!  <con«t*nt-»xpr«»»ion>  "«" 
<»t*t*m*nt > * 

Thus,  (switch-statement)  can  go  to  "default"  at  most 
once. 

C.  PARSER 

A  very  simple  form  of  a  working  parser  is  presented  in 
Appendix  B.  It  is  a  recursive  descent  parser  but  with  a 
backtracking  feature.  There  is  a  one-to-one  correspondence 
between  non-terminal  names  in  the  grammar  and  function 
names  in  the  parser.  The  reader  is  encouraged  to  read  the 
parser  with  an  eye  on  the  grammar.  With  the  grammar’ s  help, 
it  is  not  difficult  to  understand  the  structure  of  the 
parser. 

In  this  first  version  of  the  Tiny-C  parser,  all 
functions  backtrack  if  they  fail.  In  the  real  Tiny-C 
environment  this  is  extremely  unnecessary,  because  in 
the  Tiny-C  grammar,  ambiguity  exists  in  a  few  places  only. 
The  reason  this  first  version  is  presented  in  Appendix  B  is 
its  clarity  and  simplicity.  In  the  following  versions, 
unnecessary  backtracks  have  been  taken  out. 

In  all  the  routines  in  the  parser,  there  are  two 
backtracking  tools.  First,  the  "oldp"  old  pointer  points  to 
the  parser’ s  previous  place  in  the  scanner  buffer,  and 
second,  the  "line_no"  line  number  keeps  track,  of  the  current 


line  number  for  error  checking  purposes.  If  a  function  fails, 


these  routines  backtrack  to  their  previous  states  and  try  to 
find  another  legal  path  to  parse. 

Appendix  C  has  the  routines  for  the  basic  nonterminals 
and  terminals  of  the  Tiny-C  parser.  So,  it  presents  a 
working  version  of  that  parser  with  Appendix  B. 


D.  DATA  STRUCTURES  FOR  THE  PARSER 

Now  is  the  time  to  introduce  some  data  structures  to 
improve  the  Tiny— C  parser.  The  first  one  is  going  to  be  a 
name  string  structure  since  all  the  following  tables  need 
this  structure. 


1 .  Name  St  r  i  no  I rnp  1  ement  at  i  on 

A  name  string  is  basically  a  big  character  array  (or 
a  string)  which  holds  all  the  names  used  in  the  source  file. 
Tiny-C  has  two  routines  to  implement  this  structure: 


The  first  one  is  used  to  add  a  new  name  into  the  name 


string,  and  the  second  one  is  used  to  look  for  a  given  name. 


Add_n*m«()  /*  add  a  name  into  the  name  string  */ 

-C 

Adds  a  new  name  into  the  name  string  from  the 
"id_name"  global  variable.  The  "id_name"  variable  holds  the 
current  identifier  name  all  the  time.  The  function 

" ident i f ier "  in  the  parser  sets  this  variable  whenever  it 
parses  an  identifier. 

> 


f ind_n«m«  < )  /*  find  a  name  in  the  name  string  */ 

< 

Looks  for  "id_narne"  in  the  name  string.  If  found,  it 
loads  the  identifier’s  address  into  a  pointer  and  returns 
TRUE,  else  returns  FALSE. 

> 


In  the  current  versioh  of  Tiny-C,  the  name  string  was 
implemented  completely  sequentially.  Instead,  there  could 
have  been  a  hashing  mechanism,  which  would  be  much  more 
efficient.  When  testing  the  whole  compiler,  it  was  observed 
that  a  large  number  of  predefined  constant  names  and 
variables  was  making  execution  slow. 

2.  Constant  Table 

Constants  are  implicitly  declared  elements.  In 
the  Tiny-C  compiler,  a  constant  table  is  implemented  to  take 
care  of  them.  Since  every  occurence  of  a  constant  denotes 
the  same  declaration,  we  do  not  need  to  check  if  a  constant 
occurs  more  than  once.  We  simply  add  each  constant  into  the 
constant  table  as  it  occurs. 

«dd_num ( )  /*  add  an  integer  number  into  constant  table  */ 

-C 

Adds  an  integer  numeric  value  into  the  constant  table 
if  it  is  not  in  there.  And  returns  its  address  in  a 
pointer. 

> 

In  the  current  version  of  AM,  integers  are  the  only 
numeric  type.  So  it  is  the  only  numeric  type  inplemented  in 
Tiny-C,  and  is  the  only  constant  denotation  required. 

Since  input  data  is  an  integer  for  the  above  routine, 
and  since  source  file  is  read  as  character  stream  from  the 
scanner  buffer,  we  need  a  string— to— numeric  conversion 
routine,  to  convert  text  input  into  numeric  values. 


/*  string  to  numeric 
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»tr_num ( ) 

•c 

Takes  a  string  " 

set  by  the  "constant  () " 
its  numeric  value, 

(numeric  constant)  global 

> 


*/ 

nurn_nanie"  (numeric  name) 
routine  in  the  parser, 
and  returns  it  in  the 
as  an  integer. 


wh 1 ch  is 
ca leu  1  at  es 
"nuiii  enst  " 


Definition  Table 


In  Tiny-C,  the  preprecessor  command  "#def me"  lets 
us  define  constant  identifiers.  So,  a  definition  table  is 
implemented  for  these  identifiers. 

In  case  of  a  "#def ine"  declaration,  we  need  to  aa d 
a  new  constant  identifier  into  the  definition  table. 


add_cnid()  /*  add  constant  identifier  * 

I 

First,  checks  if  the  given  id-name  is  already  in  the 
definition  table.  If  so  it  gives  an  error,  since  definition 
of  the  same  constant-id  more  than  once  is  nonsense.  Otherwise 
it  nadds  that  given  constant  identifier  into  the  definition 
table. 

> 

The  next  problem  in  implementing  constant  identifiers 
is  finding  the  corresponding  values  for  these  constant  id- 
names,  if  they  are  met  when  parsing  a  program. 


f ind_cnid < )  /*  find  constant  identifier  */ 

< 

Takes  a  constant  identifier  name  and  looks  for  it 
in  the  definition  table.  If  found,  it  sets  a  pointer  to 
its  place  in  the  definition  table  and  returns  TRUE,  else  it 
returns  FALSE. 

> 


t: 

f 


4.  Scoping  Rule 

In  classical  compilers, 
responsible  for  establishing  the 


symbol  tables  are  primarily 
scoping  rules.  The  Tiny-C 


compiler  solves  the  scoping  problem  in  a  different  way. 

Our  Tiny-C  compiler  has  a  variable  string  which  holds 
all  valid  variable  names  in  the  current  scope.  When  the 

parser  starts  parsing  a  new  function  or  a  new  compound 
statement,  (namely  a  new  "block"  in  block  structured  language 
literature),  the  parser  puts  a  mark  into  the  variable  string 
to  define  the  beginning  of  the  new  block,  and  adds  the 

following  variable  declarations  into  the  same  string. 
Whenever  the  parser  goes  out  of  a  block,  it  deletes  the 
very  last  block’s  variables  from  this  string.  (Since  the 
Tiny-C  compiler  is  a  one-pass  compiler,  the  deletion  of  the 
variables  for  the  last  block  is  acceptable  in  this  case).  So, 
any  time  a  variable  is  used,  the  compiler  looks  for  this 

variable  in  the  variable  string,  starting  from  the  end  to 

the  beginning.  If  found,  it  finds  a  pointer  to  the  symbol 
table  for  this  variable,  if  not,  it  gives  an  error  message 
since  that  particular  variable  is  unknown  (or  out  of  scope). 

f  irtd_v«r  < )  /*  find  a  variable  in  variable  string  */ 

< 

Takes  an  id-name  and  looks  for  it  in  the  variable 
string.  If  found,  it  sets  a  pointer  to  the  symbol  table 

pointing  to  its  place  in  there  and  returns  TRUE,  otherwise 

it  returns  FALSE. 

> 

We  introduced  searching  for  variable  names  in  the 
variable  string  before  discussing  inserting  them.  The  reason 
is,  whenever  the  parser  meets  a  new  variable  declaration, 
it  is  supposed  to  add  that  new  variable  into  both  the  symbol 


YV 


table  and  the  variable  string.  In  the  Tiny-C  compiler,  one 
single  routine  does  both  these  duties.  Since  the  symbol 


table  is  not  introduced  yet,  we  didn’t  meet  this  routine 
either. 


<fl 


Here,  the  theory  to  satisfy  scoping  rule  is:  mark  the 
beginning  of  a  block  in  the  variable  string  when  starting 
to  parse  a  new  block,  and  delete  the  most  recent  block’s 
variables  when- exiting  from  it.  So  any  variable  which  is  not 
in  the  variable  string  is  automatically  out  of  scope. 

5.  Symbol  Tab 1 e 


In  the  Tiny-C  implementation,  the  symbol  table  is 
responsible  for  variables,  function  names,  label  names,  and 
function  arguments. 


Let’s  first  start  with  how  to  add  a  new  variable  int« 


the  symbol  table  when  a  variable  declaration  occurs. 


»dd_v«r O  /*  add  variable  */ 

-C 

Gets  a  new  variable’s  id_name  and  gets  its  type,  then  adds 
it  into  symbol  table  and  variable  string. 

> 


Similarly,  label  declarations  require  label  names  to 
be  added  into  the  symbol  table,  too.  But  we  shouldn’t 
add  labels  into  the  variable  string,  since  in  ’  C’  they  do 
not  satisfy  the  same  scoping  rules  as  variables. 


•ddlabal ( ) 


/*  add  a  label  into  symbol  table  */ 


Gets  a  label,  and  adds  it  to  the  end  of  the  symbol  table. 


'MwJkpS.'ci 
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Whenever  the  parser  meets  a  new  label  declaration,  it 
adds  this  label  into  the  symbol  table  by  the  above  routine. 
But  it  must  be  smart  enough  not  to  accept  duplicate  label 
declarat ions. 

One  pointer  is  assigned  to  point  to  the  beginning  of  the 
very  last  function  in  the  symbol  table.  So,  when  the 
parser  meets  a  new  label  declaration,  it  first  starts  from 
the  beginning  of  the  last  function  in  the  symbol  table,  and 
goes  all  the  way  down  to  the  end  of  it,  to  look  for  a 
same  label  name.  If  it  finds  one,  it  gives  a  duplicated 
label  declaration  error,  since  the  same  label  is  not  allowed 
to  be  declared  twice  in  the  same  routine  in  this  language. 
The  following  routine  does  this  .job  in  the  Tiny-C  compiler. 

dup_lbl()  /*  is  duplicate  label?  */ 

Checks  if  the  same  label  name  has  been  declared  before. 

> 

6.  Label  Table 

In  the  C  language,  any  label  referenced  by  a  goto 
statement  has  to  be  declared  somewhere  in  the  same  function. 
Classically,  compilers  read  the  source  file  twice.  Etut  the 
number  of  input/output  operations  is  very  important  for 
total  execution  speed.  Since  the  Tiny-C  compiler  is 
designed  as  a  " one-pass-comp i ler" ,  we  immediately  have 
this  problem:  detection  of  undeclared  labels. 

Classical  two  pass  compilers  read  all  label  declarations 
in  the  first  pass.  So,  in  the  second  pass  they  can  check  if 


"goto  label"  statements  are  valid.  When  our  one-pass  Tiny-C 
compiler  meets  a  goto  statement,  and  if  the  referenced 
label  name  has  not  been  declared  yet,  it  is  unpredictable 
if  this  label  is  going  to  be  declared  in  the  following 
statements.  To  solve  this  problem,  Tiny-C  implements  a 
label  table,  and  at  the  end  of  every  function,  it 
checks  if  a  referenced  but  undeclared  label  exists. 

Whenever  a  label  is  referenced  by  a  goto  statement, 

the  compiler  saves  it  in  the  label  table  by  the  following 

rout i ne. 

»ave_lbl<)  /*  save  label  into  the  label  table  */ 

<. 

Inserts  a  label  which  is  referenced  by  a  "goto" 

statement  into  the  label  table  for  future  checking. 

> 

find  at  the  end  of  every  function,  the  compiler 

checks  if  the  labels  referenced  by  goto’ s  were  ever  declared 
in  the  function. 

check_l*b»l» < )  /*  check  labels  */ 

-c 

Called  by  the  parser  at  the  end  of  every  function  body. 
Checks  if  labels  in  the  label  table  are  declared  in  the 
symbol  table. 

> 

7.  Function  Calls 

Tiny-C  keeps  function  names  and  their  argument  counts  in 
the  symbol  table.  In  case  of  a  function  call,  it  checks 
if  this  function  has  been  called  before,  and  if  it  has  not, 
enters  its  name  and  argument  count  into  the  symbol  table. 


If  it  has  been  entered  before,  it  checks  if  the 
argument  count  in  the  new  function  call  is  the  same  as 
the  one  in  the  symbol  table.  If  the  argument  counts  are 
not  the  same,  it  gives  an  "inconsistent  argument  count" 


error. 

*dd_f un < f un_no)  /*  add  function  into  symbol  table  */ 

char  *fun_no; 

■C 

Adds  a  function  name  and  its  argument  count  into  the 
symbol  table  in  case  of  a  function  call,  and  if  it  is  the 
first  call  of  the  function.  If  it  is  not  the  first  call,  th 
function  is  already  in  the  symbol  table,  so,  it  checks  1 
argument  counts  match.  In  both  cases,  it  returns  the 
function’s  function  number  (basically  symbol  table  entry 
number)  to  the  parser,  to  emit  intermediate  code. 

> 


8.  Function  Declarations 

In  the  C  language,  parameter  declarations  follow  a 
function  declaration.  Parameter  names  have  to  be  given 
inside  parentheses  immediately  following  a  function  name, 
and  then  they  have  to  be  declared  one  more  time  with  their 
types. 

The  following  parameter  declarations  have  to  match  the 
ones  given  with  function  name.  Tiny-C  has  two  routines  to 
get  this  mechanism  to  work  properly. 


chk_prmt<)  /*  check  parameter  */ 

< 

At  the  end  of  a  parameter  declaration,  this  routine 
checks  if  that  parameter  was  given  as  one  of  the  function’s 
arguments,  or  if  it  is  declared  more  than  once!  If 
everything  is  proper,  it  enters  the  parameters’  type  into  the 
symbol  table,  since  the  parameter  name  was  already  entered 
before  (when  parsing  the  parameter  list  following  the 
function  name). 

> 


-ii  HI 
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And,  at  the  end  of  all  parameter  declarations,  compiler' 
has  to  make  sure  that  all  the  arguments  given  with 
function  name  were  declared  as  parameters. 


chk_p*rms<)  /*  check  all  the  parameters  */ 

When  parameter  declarations  are  done,  checks  if  there  is 
any  parameter  name  in  the  symbol  table,  without  its  type. 
Since  parameter  names  are  entered  into  the  symbol  table  when 
parsing  the  parameter  list,  and  types  are  entered  in  there 
when  parsing  the  following  parameter  declarat ions,  if  there 
is  any  parameter  with  its  type  missing,  that  means  it  is  not 
declared. 

> 


These  are  all  the  data  structures,  used  by  the  parser  to 
manage  variables,  constants,  labels,  function  names  and 
arguments,  and  all  remaining  structures  in  the  Tiny-C  parser. 
The  following  section  improves  the  parser  one  more  step,  and 
handles  the  error  checking  mechanism. 


E.  ERROR  CHECKING 

A  list  of  error  and  warning  messages  used  in  the  Tiny-C 
compiler  is  given  in  Appendix  D. 

Error  and  warning  messages  are  given  by  the 
following  routines: 

•rr_m«g (m»Q_no)  /*  error  messages  */ 

char  rnsg_no; 

-C 

/*  increment  error  counter  */ 

++err_cnt ; 

/*  give  line  number  of  the  error  ■*/ 
print f("%d  error!  ", line_no) ; 


/*  and  give  the  error  message 


*/ 

switch  (rnsg_no) 

•C 

case  list  for  all  error  messages  described  in  Appendix  D. 

> 

> 


warning <rn«g„no)  /*  warning  messages  */ 

char  msg_no; 

< 

/*  give  line  number  of  the  warning  */ 
printf("*/-d  warning!  ",  line_no)  ; 

/*  give  the  message  */ 

switch (msg_no) 

< 

case  list  for  all  warning  messages  described  in  Appendix  D. 

> 

> 


F.  INTERMEDIATE  CODE  GENERATION 

In  order  to  generate  code  for  the  target  machine,  first 
the  compiler  has  to  build  a  parse  tree.  Appendix  E  is  a 
list  of  nodes  that  form  Tiny-C  parse  trees. 

Now,  the  same  old  heavy-duty  parser  can  shoulder  one  more 
job:  emissions  of  intermediate  code. 

The  following  routine  does  the  intermediate  code 
emissions,  when  called  by  the  parser.  It  takes  two  arguments; 
the  node  itself,  and  the  number  of  the  children  of  this  node. 
If  there  is  not  any  error  up  to  that  time,  the  parser  emits 
the  code  into  an  emission  table,  (which  is  in  fact  a 
flattened  pai'se  tree)  and  increments  the  emit -counter. 


emit (node, ch i Id )  /*  emit  intermediate  code  */ 

char  node,  /*  node  kind  to  emit  */ 

child;  /*  #  of  the  children  belonging  to  this  node*/ 
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I 

I 


< 


i 


3 

i 

I 


s 


i 


< 

/*  if  there  is  riot  any  error,  give  emissions  */ 

if  ( ! err_cnt  ) 

< 

erni t st r  Cemi t  _cnt  D  =  node; 
emi tch 1 Cemi t _cnt 3  =  child; 

++emit_cnt ; 

> 

j 

G.  POSTPONED  EMISSIONS 

There  are  times  when  we  do  not  want  to  emit  code  in  the 
same  order  as  we  parse.  ftn  assignment  statement  is  a  good 
example  for  this  situation. 

Suppose  we  have  the  assignment: 

Joe  ■  Jimmy  *  S| 

The  parse  tree  for  this  statement  is: 

ass i gnment 

variable  multiplication 


joe  variable  constant 

j  1  mrny  5 

Since  our  parse  tree  is  in  flattened  form,  the  order  of 
the  intermediate  code  emissions  for  the  above  tree,  should 
be : 

Jimmy,  variable,  5,  constant,  multiplication,  Joe,  variable, 
assignment 

But  this  is  not  the  same  order  we  parse!  There  may  be 


some  quick  solutions  for  this  particular  problem.  But  the 


case  might  be  worse  than  the  above  one. 


Consider'  the 


following  statement: 

jo«C  <jimmy#15)  %  mary++  ]  ■  Jo»C53; 

Here,  the  left  value  is  not  a  simple  variable.  It  is  an 
array  element  with  a  complex  index  expression. 

Summarizing,  there  are  cases,  when  we  simply  do  not  want 
to  give  emissions  immediately.  We  want  to  save  them,  and  then 
at  the  end  of  some  certain  expressions  we  want  to  emit  them. 
This  type  of  emission  is  called  "postponed  emission." 

Up  to  now,  our  recursive  descent  parser  has  been 
suffering  the  same  problem.  But  for  the  sake  of  simplicity, 
we  ignored  it.  Now  is  the  time  to  build  some  mechanisms  to 
make  the  parser  be  able  to  postpone  emissions. 

First  of  all,  we  have  to  make  our  emission  tool  mors 
flexible.  The  following  is  revised  version  of  our  "emit -code" 
funct ion. 


emit (node, child) 

int  node, 

child; 

/*  if  there  are  not  any  errors,  give  emissions  */ 

if  ( ! err_cnt  ) 

-C 

*  (emitptr Ctt]  +  (■*<  emitptrt£D  )))  =  node; 

*  (emitptr Cl]  +  (*(  emitptrCC]  )))  =  child; 
++(*  (emitptr C£] )) ; 

> 

> 


As  can  be  seen,  this  revised  version  is  not  restricted  to 
emit  code  into  emission  table  all  the  time.  It  can  emit  code 


into  any  table  which  is  addressed  by  "emitptr"  pointers.  That 
is,  by  setting  these  pointers  somewhere  else,  we  can 
"redirect"  the  emissions. 

The  following  routine  directs  emissions  into  a  given 
pointer  set.  This  given  pointer  set  is  supposed  to  be 
pointing  to  a  table,  of  the  same  type  as  the  emission  table. 


drct_«mit <«mit_ptr,  ptrl,  ptrS,  ptr3>  /*  direct  emits  */ 
int  *emit_ptrC3,  /*  pointer  set  to  emissions  * / 

ptrl  CU  , 

ptr£CJ  , 

*pt  r3  ;  /*  pointers  to  new  direction  */ 


emit_ptr C0D=ptrl ; 
erni  t  _pt  r  C  1  3  =pt  r£  ; 
emit  _pt r  C£3 =  pt  r3 ; 


As  we  have  seen  before,  our  emit -code  routine  emits 
into  a  table,  pointed  to  by  the  "emitptr"  global  emission 
pointers.  But,  if  we  redirect  these  pointers  into  somewhere 
else,  don’t  we  lose  the  address  of  the  previous  table?  So,  we 
have  to  be  able  to  save  our  previous  emission  addresses 


somewhere. 


The  following  routine  saves  these  pointer' 


addresses  in  given  ones. 


rplc_«mits <ptr_«,  ptr_b)  /*  saving  emit  pointers  */ 

int  *ptr_aC3, 

*ptr_bC3;  /*  pointer  sets  to  both  emit-tables  */ 
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pt r_a C03 =pt r_b C03  ; 
ptr_aC13=ptr_bC13  ; 
pt r_a [£3 =pt r_b C£3 ; 


And  the  very  last  problem:  We  are  able  to  redirect  our 


"emitptr"  emission  pointers  into  some  tables  (then  obviously 


successive  emissions  are  then  entered  into  these  tables).  We 
are  able  to  save  the  previous  value  of  these  pointers.  But 
what  about  the  "postponed  emissions".  Namely  the  ones  we 
saved  somewhere  else  other  than  our  emission  table.  The 
following  routine  transfers  previously  saved  emissions  from 
one  table  into  another. 


trnm_»mit» (•mit_*,  •mit_b)  /*  transfer  emits 

int  *emit_aCD,  /*  destination  table  pointers 

*emit_b CD ;  /*  source  table  pointers 

< 

char  i  : 


for  (i=0;  i<  <  *  (ernit_bC£3 )  )  ;  ++i) 

< 

*<  emit_aC0H  +  (  * (emit_a ) ) ) 
*<  emit_aC13  +  <  * <emit_a ) ) ) 
■+•+  <  *(ernit_a[£]  )  )  ; 

> 


*  <  emi  t  _b  C0 II  + 1 )  ; 

*  <  ern it_bC13  +  i)  ; 


H.  CODE  OPTIMIZATION 

Under  normal  conditions,  code  optimisation  can  be  done 
on  both  intermediate  code  and  target  code.  When  generating 
target  code,  compilers  attempt  to  find  the  best  code 
generation  sequence,  eliminate  common  sub-expressions, 
minimize  the  number  of  temporary  variables.  And  after 
code  generation  is  done,  they  pass  through  it  again  one  or 
two  times,  for  peep-hole  optimization,  jump  optimization, 
etc. 

Our  Tiny-C  intermediate  code  has  a  flattened  tree 
structure;  it  is  possible  to<  traverse  it  as  a  tree.  In  order 
to  do  this,  we  will  need  some  interface  routines  between 


m 


this  flattened  form  and  a  real  tree  structure.  Then  we  can 
logically  look  at  it  as  a  tree  and  travel  from  root  to  leaves 
or  vice-versa. 

In  this  thesis  work,  it  was  decided  to  generate  code  as 
quickly  and  simply  as  possible.  So  the  Tiny-C  compiler  uses 
sequential  code  generation,  even  though  it  is  not  the  best 
way  to  do  it. 

Since  our  code  is  going  to  be  source  code  for  the  AM 
assembler,  it  is  not  going  to  be  easy  to  work  on  a  "text" 
file,  to  optimize  it.  At  this  point,  we  can  work  on  our 
intermediate  code  to  make  it  more  effective.  So,  contrary  to 
the  classical  compilers,  our  code  optimization  is  going  to  be 
only  on  intermediate  code,  instead  of  both  intermediate  and 
target  codes. 

There  are  several  things  we  do  in  the  code  optimization 
phase : 

-  Removing  dead  code 

-  Label/jump  optimization 

-  Emitting  imbedded  assignments 

The  last  one  cannot  be  classified  as  part  of  code 
optimization  phase,  although  we  deliberately  left  it  to  this 
point.  We  will  see  why  pretty  soon. 

1  •  Dead  Code  Elimination 

In  some  cases,  the  Tiny-C  compiler  generates  dead- 
code.  For  instance: 


In  the  intermediate  code  list,  there  is  a  node, 
called  "DUMMY".  Sometimes  our  parser  may  emit  some  code,  but 
then  it  may  realize  that  this  code  is  not  necessary.  In  that 
case  emitting  a  "DUMMY"  node  makes  this  previous  code  "out 
of  concern"  or  a  "dummy  statement". 

In  fact,  such  a  tool  is  not  truly  necessary,  but  was  used 
in  early  versions  of  the  compiler.  In  the  following  phases 
this  "DUMMY"  node  was  used  only  in  the  "case"  statement.  Due 
to  constraints  on  time,  it  has  not  been  removed. 

As  we  discussed  before,  this  thesis  is  a  presentation  of 
the  firs*  version  of  the  Tiny-C  compiler,  and  hopefully  a 
reference  for  its  future  authors,  rather  than  a  discussion 
about  compiler  writing  techniques. 

Nevertheless,  to  simplify  the  tree  we  can  remove  this 
"DUMMY"  node  and  its  children. 

In  addition,  there  may  be  dead-code  that  is  generated  by 
the  compiler.  An  example: 

Jo*  ■  5 | 
goto  theref 

Joe  ■ 

++J immy | 

there i 

Here  two  statements,  in  the  third  and  fourth  lines  are 
dead  code.  They  will  never  be  used.  So,  we  can  remove  this 
dead  code  from  the  parse  tree. 

£.  Dead  Label  Elimination 

In  general,  any  label  declaration  is  automatically  the 
beginning  of  a  new  basic  block.  However  if  there  is  no 


«j 

I 
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"goto"  for  this  label,  then  such  a  l'abel  is  part  of  a  larger 
basic  block. 

Having  basic  blocks  as  large  as  possible  removes  the 
amount  of  data  transfer  between  registers  and  memory.  In 
other  words  it  reduces  the  number  of  "register  cleaning" 
operat ions. 

So,  if  we  detect  labels,  which  are  declared  but  never 

used,  removing  them  is  going  to  be  an  improvement. 

3.  Temporary  Variables  in  the  Front  End 

There  is  one  more  thing  that  has  to  be  done  when 
passing  over  the  intermediate  code  for  optimizing  purposes. 

In  the  parser,  arithmetic  expressions  following  a 
"switch"  reserved  word  are  assigned  to  some  temporary 
variables.  These  temporary  variables  are  represented  by 
"TVftR"  nodes,  with  a  temporary  variable  number.  Since  the 
result  of  those  arithmetic  expressions  are  assigned  to 
"TVftR"  nodes,  and  these  variables  are  compared  with  "case" 
labels,  we  have  to  allocate  memory  for  these  nodes  just  as 
we  are  going  to  do  for  normal  variables.  The  values  of 
"TVAR"  nodes  may  or  may  not  reside  in  their  allocated  memory 
locations,  they  may  be  kept  in  registers,  too.  The  register 
manager  in  the  following  section  will  treat  them  just 
like  variable  nodes. 


In  fact,  all  variables  are  referred  to  by  their 
symbol  numbers,  or  their  symbol  table  entry  numbers.  And  at 
this  point,  we  know  our  symbol  table  length.  So  we  can  assign 


* 
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some  new  symbol  numbers  to  these  "TVAR"  nodes,  and  change 
their  names  to  "VftRB"  variable  nodes.  Then  the  register 
manager  can  take  care  of  the  rest. 

4.  Code  Optimization.  Phase  1. 

The  following  routine  is  the  first  part  of  the 
intermediate  code  optimization,  and  is  called  just  after  the 
parser. 


frstoptO  /*  first  pass  of  optimization  */ 

{ 

-  Detects  dead-code  and  replaces  it  with  "NOQP"  no 
operation  nodes. 

-  Detects  unused  labels  and  replaces  them  with  "NOOP" 
nodes. 

-  Replaces  "TVftR"  nodes  with  "VftRB"  nodes  and  assigns 
them  new  symbol  numbers  starting  from  the  last  symbol  number 
in  symbol  table. 

> 


5.  Separation  of  Front  End  and  Code  Generator 

Up  to  now,  our  intermediate  code  has  been  in  memory, 
in  its  allocated  location  (emission  table).  The  emission 
table  has  to  be  large  enough  to  be  able  to  keep  the  largest 
size  program  in  it,  because  of  its  "ixed  size.  If  the  input 
source  file  is  too  big  to  fit  into  our  emission  table,  Tiny-C 
responds  with  an  error  message.  (This  is  one  of  the  reasons 
it  is  called  Tiny-C). 

It  is  possible  to  pass  this  emission  table  to  the  second, 
target  machine  dependent  part  of  compiler,  but  it  would 
not  be  efficient. 


There  is  a  logical  separation  between  parser/ 
intermediate  code  generator  and  target  code  generator. 
The  first  part  is  totally  language  dependent  and  machine 
independent,  and  the  second  part  is  machine  dependent  but 
language  independent.  So,  putting  a  physical  separation 
between  these  logically  independent  units  is  always  a  good 
idea,  and  has  been  implemented  in  most  compilers. 

For  this  reason  we  should  end  the  first  part  of 
this  compiler  here.  But  before  doing  this,  we  have  to  pass 
the  outcome  of  this  part  to  the  second  part  of  compiler 
(basically,  the  code  generator  of  Tiny-C). 

The  code  generator  is  going  to  need  int errned  iate  code, 
a  symbol  table,  a  constant  table,  and  the  number  of 
temporary  variables  used  by  the  parse'.  fill  this  information 
has  to  be  written  in  some  place  for  later  access  by  the  code 
generator. 

But  we  have  a  last  minute  problem  here,  which  we 
deliberately  ignored  up  to  now.  This  is  "imbedded 
assignments." 

6.  Imbedded  Assignments 

In  the  C  language  the  statement; 

Joe  ■  Jimmy++  #  5| 

is  in  fact  two  different  statements: 


Joe  ■  Jimmy  *  5|  and  a  following: 
++J immy | 


The  second  statement  here  is  an  "imbedded  assignment."  We 
didn’t  emit  code  for  imbedded  assignments  up  to  now,  and  in 
fact  we  have  ignored  this  problem  on  purpose.  Because  right 
now,  when  writing  intermediate  code  into  a  quad  file,  we 
can  simply  emit  these  codes  without  any  effort. 


Code 


it  irni  zat  ion.  Phase  2.  The  Quad  File  Filter 


The  following  routine  is  the  second  part  of  the 


intermediate  code  optimizer.  It  is  called  just  after  the 


first-pass  optimizer. 


acndopt ( )  /*  second-pass  optimization  */ 

Creates  a  quad  file  named  "TC. QQQ"  and: 

-  Writes  intermediate  code  in  this  file,  without 
"NOOP"  codes  and  with  additional  imbedded  assignments. 

-  Marks  end  of  intermediate  code 

-  Writes  symbol  table 

-  Writes  number  of  the  temporary  variables  (TVARs) 

-  Writes  constant  table 

-  Writes  name  string 

-  find  closes  that  quad  file. 


I.  DATA  STRUCTURES  FOR  CODE  GENERATION 

The  final  step  is  code  generation  for  the  Abstract 
Machine. 

As  discussed  before,  the  output  of  this  compiler  is  not 
going  to  be  binary  code  which  is  ready  to  be  linked  and 
run.  It  is  going  to  be  a  source  file  for  the  AM  assembler, 


so  it  will  be  readable. 


Since  this  is  the  second  part  of  the  compiler,  it 
receives  the  work  done  in  the  first  part.  The  following 
routine  reads  a  Tiny-^  quad  file  from  the  disk. 

re«d_quad()  /*  read  quad  file  */ 

< 

Reads  quad  file  from  disk  in  a  sequence  of  intermediate 
code,  symbol  table,  constant  table  and  name  string. 

> 

Now  the  compiler  has  all  the  information  it  needs  to  go 
ahead  and  generate  code.  But  right  now  it  does  not  have  any 
tools  to  do  this.  We  build  some  tools  first,  to  help  the 
code  generation  phase. 

The  target  machine  AM  theoretically  has  an  unlimited 
number  of  registers.  This  is  not  realistic.  So,  the  Tiny-C 

compiler  considers  that  AM  has  a  reasonable  number  of 

registers,  and  tries  to  manage  them  properly. 

Keeping  all  the  variables  and  all  the  intermediate 
results  in  registers  would  be  awfully  nice.  But  since  this  is 
impossible  and  we  are  going  to  run  out  of  registers  after 
generating  a  piece  of  code,  we  will  need  a  "register 
manager"  to  handle  the  limited  number  of  registers 

properly.  Tiny-C  compiler  does  not  have  a  single  "register 
manager"  routine.  Instead,  we  will  introduce  a  couple  of 

routines,  which  manage  AM  registers  properly. 

1 .  Address  Descriptors 

As  it  is  known,  a  compiler  cannot  keep  all  -  the 

So, 


variables  in  registers  all  the  time 


it  is  obvious 


that  a  variable  may  be  in  a  register,  or  in  its 
allocated  memory  location,  or  both,  at  a  particular  time, 
ft  compiler  needs  a  mechanism  to  keep  track  of  the  current 
addresses  of  all  variables.  The  following  routine  sets  symbol 
addresses  by  given  parameters. 


addr_d«cr <«ym_no, status, r»g_no)  /*  symbol  addr.  descriptor*/ 
char  sym_no,  /*  symbol  number  */ 

status,  /*  address  status  */ 

reg_no;  /*  register  number  */ 

< 

Sets  current  addresses  of  variables.  ftll  variables 
have  an  8-bit  value  address  descriptor.  Status  may  be 
" in-register",  " in-memory"  or  "in-both".  If  7th  bit  of 

this  descriptor  is  1,  that  means  variable  is  in  its 
allocated  memory  location.  If  the  value  stored  in  bits  0  to 
6  is  zero,  means  variable  is  not  in  any  register.  If  it  it 
different  from  zero,  that  value  minus  one  gives  the  register 
number  which  symbol  is  stored  in. 

> 


Exactly  the  same  problem  exists  for  constants.  Even 
though  constant  values  are  fixed  and  they  reside  in  a 
constant  table  all  the  time,  the  compiler  should  not  transfer 
a  constant  value  into  a  register  if  it  is  already  in  one. 
The  following  routine  sets  a  constant  address  descriptor. 


cn«t_adr_dacr <cn«t_no,  statue,  r«a_.no> 


int 

cnst_no; 

/* 

constant 

number 

*/ 

char 

status, 

/* 

stat  us 

*/ 

reg_no ; 

/* 

register 

number 

*/ 

< 


Sets  current  addresses  of  constants.  ftll  constants  have 
an  8-bit  value  address  descriptor.  If  7th  bit  of  this  value 
is  1,  and  all  others  are  zero,  that  means  the  constant  is 

not  in  any  register.  Otherwise  the  value  of  this  descriptor 
gives  the  register  number  which  the  constant  resides  in. 

> 


2.  Temporary  Management 

There  is  one  more  address  problem.  When 
calculating  an  arithmetic  expression,  we  may  have  a  couple 
of  temporary  results.  For  instance: 

immy  *  5  joe  #  3|" 

The  statement  has  the  following  parse  tree: 

assignment 

variable  addition 

joe  multiplication  multiplication 

variable  constant  variable  constant 


j i mmy  5  joe  3 

Here,  the  compiler  calculates  "jimmy  *  5"  and  "joe  *  3" 
first.  Since  it  has  to  keep  these  results  somewhere 
temporarily,  we  have  to  manage  these  temporaries  and 
keep  track  of  their  addresses. 

Tiny-C  compiler  manages  temporaries’  addresses  exactly  in 
the  same  way  as  it  does  for  variables.  In  addition,  it  may 
dispose  a  temporary,  so  we  can  use  the  same  temporary  number 
somewhere  else  later. 

di»pome_temp <temp_no)  /*  dispose  temporary  */ 
char  temp_no; 

{ 

Disposes  the  given  temporary  variable. 

> 
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When  the  code  generator  finishes  a  statement  completely, 
there  is  no  need  for  any  temporary,  in  Tiny-C’s  sequential 
code  generation  order.  So  at  the  end  of  every  statement, 
the  compiler  disposes  of  temporary  variables. 


/*  clean  all  temporaries  */ 


cle«n_temps < ) 


Disposes  all  the  temporaries. 


Compiler  needs  a  new  temporary  every  time  it  calculates  a 
temporary  result.  So,  the  following  routines  provide  new 
temporaries  to  the  code  generator. 


get_A_temp (temp_no)  /*  get  a  temporary  variable  */ 

char  *temp_no; 

C 

Finds  an  unused  temporary,  returns  its  number  to  the  code 
generator,  and  marks  it  "used. " 

> 

3.  Finding  Current  Addresses 

The  compiler  should  be  able  to  figure  out  any  given 
token’s  address  at  any  time.  Tiny-C  uses  the  following 
routines  for  this  purpose. 


tm_inr«Q  <tok*n_r»o,  kind)  /*  is  token  in  a  register?  */ 

int  token_no;  /*  token  number  */ 

char  kind;  /*  token  kind  */ 

< 

Takes  token  kind  (variable,  constant  or  a  temporary 
variable)  and  its  token  number,  returns  TRUE  if  it  is  stored 
in  a  register,  else  returns  FftLSE. 

> 

If  a  particular  token  is  in  a  register,  it  can  be 
figured  out  which  register  this  one  is. 
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B*t_refl_num <token_no,  reg,  kind)  /*  get  register  number 
int  token_no;  /*  token  number 

char  *reg,  /*  register  number 

kind;  /*  token  kind 

< 

Takes  a  token  number  and  its  kind,  and  returns 
register  number  in  "reg"  pointer. 

> 


After  some  operations,  variable  values  may  be  only 
in  registers,  and  may  not  be  in  their  allocated  memory 
locations.  The  compiler  should  figure  out  if  a  given  varible 
is  in  memory,  to  avoid  transferring  it  into  its  memory 
location  unnecessarily. 


isinmem (sym_no)  /*  is  symbol  in  memory?  */ 

char  sym_no;  /*  variable’s  symbol  number  */ 

Checks  variable’s  address  descriptor,  returns  TRUE  if  it 
is  in  memory,  else  returns  FALSE. 

> 


4.  Register  Management 

A  register  can  hold  just  one  single  value.  But  in 
the  Tiny-C  compiler,  this  value  can  belong  to  more  than 

one  token  at  the  same  time;  for  instance  the  same  register 
can  keep  two  variables,  one  constant  and  i  jo  temporary 
variables  in  it  if  they  all  have  the  same  value  at  that 
particular  time. 

We  will  define  the  structure  of  the  register  manager 
like  this: 


#define  MXREG  16  /*  #  of  target  machine’s  registers  */ 
♦♦define  MXVAR  5  /*  maximum  #  of  variables  that 

one  single  register  can  hold  */ 
int  *regtr CMXREGJ ,  /#  pointers  to  register  variables  */ 
reg_arr  CMXREG*MXVARJ ;  /*  register  variable  array  */ 


So,  every  register  has  an  amount  MXVAR  of  register 
array  <reg_arr)  locations.  These  are  token  descriptors  and 
shows  which  tokens  (variables,  constants  and  temporaries) 
that  particular  register  has  at  any  time.  The  size  of  the 
register  array  is  MXREG  times  MXVAR. 

The  register  array  keeps  the  names  of  the  tokens  which 
are  loaded  in  some  registers. 

Since  particular  parts  of  the  register  array  belong  to 
particular  registers,  we  can  easily  figure  out  which  tokens 
are  in  which  registers,  or  which  register  has  which  tokens. 

In  order  to  calculate  a  new  result,  the  compiler  has  to 
find  an  unused  register  to  load  the  value.  The  following 
routine  provides  free  registers  to  the  code  generator. 

get_*.-r»Q  (reg)  /*  get  a  register  */ 
char  #reg;  /*  register  number  */ 

< 

Checks  every  registers  register  array  locations.  If  finds 
a  blank  one,  ret  urns  this  register  to  code  generator.  If  they 
are  all  occupied,  evacuates  one  of  them  randomly,  and  returns 
it . 

> 

The  compiler  should  be  able  to  load  a  token  from  its 
memory  location  into  one  of  the  registers.  The  following 
routine  is  used  for  this  purpose. 


load_in_r«g  <token_no, reg, kind) 

/* 

load  into  a  register 

*/ 

int  token_no; 

/* 

token  number 

*/ 

char  #reg, 

/* 

register  number 

*/ 

kind  ; 

/* 

token  kind 

*/ 

< 


Takes  a  register,  a  token  number  and  its  kind, 
generates  code  to  load  it  into  that  given  register. 

> 


and 


After  loading  this  token  into  a  register,  its  address 
descriptor  has  to  be  set  as  "in  both  register  and  memory", 
and  the  register  manager  should  set  the  members  of  this 
particular  register. 

Suppose  we  load  an  integer  value  "3"  into  a 
register.  The  register  manager  should  know  1:hat  the  register 
is  keeping  a  constant  value  "3",  or  which  constant  number 
from  our  constant  table  is  in  that  register. 

Then,  suppose  we  assign  this  constant  to  a  variable, 
like  in  the  statement:  "joe=3.  " 

The  register  manager  should  mark  that  this  particular 
register  has  a  constant  and  a  variable  in  it. 

The  following  routine  helps  the  register  manager  to  state 
that  a  register  is  now  holding  a  given  token. 


occupy _r«g  <  t  ok«n_nof  r«a ,  k i nd ) 

/* 

occupy  register 

*/ 

int  token_no; 

/* 

token  number 

*/ 

char  ♦reg, 

/* 

register  number 

*/ 

8  kind; 

/* 

token  kind 

*/ 

Enters  given  token  into  given  register’s  register  array 
location,  to  mark  that  this  register  is  holding  that  given 
token  in  it. 

> 

There  may  be  times  when  the  compiler  assigns  a  new 
value  to  a  variable  but  that  particular  variable  may 
have  been  stored  in  a  different  register  before.  Since  we 
want  to  bind  a  new  register  to  the  old  vat  lable,  we  want  to 
release  its  old  register. 


r»l_«ym_r»g  <»ym_r>o)  /*  release  symbols’  s  register  */ 

int  sym_no; 

•C 

Takes  a  variable,  finds  its  register,  and  deletes  its 
membership  to  this  register. 

> 


Sometimes  the  compiler  has  to  store  a  token  from  its 
register  into  its  memory  location.  The  following  two  routines 
do  this  chore. 


•v«..*ymbol  <«ym_no,  r«g_no) /*  evacuate  register  from  symbol  */ 
char  sym_no,  reg_no; 

< 

Generates  code  to  transfer  symbol  from  register  into 
memory.  Then  sets  the  symbol’s  address  descriptor  as  "in 
memory"  only. 

> 


ev«_temp < temp_no, reg_no)  /*  take  temporary  out  of  register*/ 
char  temp_.no,  reg_.no; 

< 

Generates  code  to  transfer  temporary  from  register  into 
memory.  Then  sets  its  address  descriptor  as  "in  memory"  only. 
> 


find  there  are  some  cases  when  compiler  wants  to  empty  a 
register  completely.  For  instance,  we  may  do  this  to  release 
a  register. 

*v«_reg  (reg_.no >  /*  evacuate  register  */ 

char  reg_no; 

< 

Takes  a  register  number,  finds  all  its  members  in  the 
register  array,  and  generates  code  to  transfer  those  members 
to  their  memory  locations  if  they  are  not  already  there. 
(Uses  above  two  routines,  actually). 

> 


Before  getting  out  of  a  basic  block,  the  compiler  should 
empty  all  registers.  The  following  routine  does  this  task. 


/*  clean  registers 


I 


m 


i 

1 


M 


f 

i 


clean_regs()  /*  clean  registers  */ 

< 

Calls  "evacuate  register"  routine  for  all  registers. 


5.  Operands  for  the  Operators 

In  the  actual  code  generation  phase,  the  compiler 
looks  for  an  operator,  and  according  to  operator’ s  type, 
requests  the  registers  for  operands.  The  following  two 
routines  return  integer  operands  in  registers. 


lo«d_two_opr*nd  <  J ,  r  1 ,  r2,  step)  /*  load  two  integer  operand  */ 
int  j;  /*  pointer  to  int.  code  */ 

char  *rl,  *r2,  /*  registers  */ 

♦step;  /*  #  of  the  total  steps  taken*/ 

•C 

Gets  two  operands  from  intermediate  code,  loads  them 
into  two  available  registers,  and  returns  these  register 
numbers  to  the  code  generator.  Since  our  parse  tree  is  in 
a  flattened  form,  the  code  generator  needs  to  know  where  it 
came  in  that  array-tree,  after  loading  these  operands.  So, 
the  "step"  is  a  variable  that  tells  how  many  steps  have 
been  consumed  in  the  intermediate  code. 

> 


lo«d_one_oprnd ( i, reg, step)  /*  load  one  integer  operand  */ 
irit  i;  /*  pointer  to  int.  code  */ 

char  /*  register  number  */ 

♦step;  /*  #  of  the  total  steps  taken*/ 

< 

Loads  the  next  operand  in  the  parse  tree  into  a 
register,  and  returns  the  register  number  with  the  number  of 
steps  walked  in  the  parse  tree. 

> 

The  Abstract  Machine  AM,  has  some  boolean  operators 
that  accept  only  boolean  operands.  But  everything  in  Tiny-C 
has  integer  type.  So  the  code  generator  should  have  some 
tools  to  convert  integer  values  into  booleans.  The 


following  two  routines 


provide  boolean  operands  for  boolean 


operators,  whenever  they  are  needed. 


two_bool < J, r 1 , r2, step)  /*  load  two  boolean  operand  */ 


int 

char 


J  5 

*r  1 ,  *r£, 
♦step ; 


/*  pointer  to  int.  code  */ 
/*  registers  */ 
/*  #  of  the  total  steps  taken*/ 


Loads  two  operands.  If  they  have  integer  values  it  loads 
the  correspond i ng  boolean  values  into  registers  and  returns 
them  to  the  code  generator. 

> 


one_bool ( i , r»g_no, step)  /*  load  one  boolean  oprnd  */ 
int  i;  /*  pointer  to  int.  code  */ 

char  *reg_no,  /*  register  number  */ 

♦step;  /*  #  of  the  total  steps  taken*/ 

< 

Returns  one  boolean  operand  into  a  register. 

> 


J.  CODE  GENERATION 


In  the  Tiny-C  compiler,  the  main  routine  in  the  code 


generator  is  a  large  switch  statement  as  is  used  in  most 


compilers.  The  compiler  generates  code  for  the  data  segment 


first,  which  is  just  a  memory  allocation  routine  for  the 


symbols.  Then  the  code  segment  comes  as  the  actual  code 


generation  phase.  The  following  routine  is  a  subset  of  the 


code  generation  routine  for  the  code  segment.  Each  case 


element  dispatches  to  the  code  emitter  for  that  case. 


cod«_seg  < ) 

int 

char 


/*  give  code  segment 


i;  /*  index  variable 

rl,r2,  /*  register  numbers 


/*  walk  emit  array  from  beginning  to  emit-end 
for  <i=0;  i<emitend;  ++i) 


/*  if  node  has  children,  (if  it  is  not  a  leaf)  */ 
if  (erni  tch  1  C  i  3  !  =0  ) 


switch  (  emitstrCiD  ) 


case  I ADD 


/*  integer  addition  */ 
code_iadd ( i, &step) ; 
break; 


case  MEND 


/*  end  of  main  function  */ 
fprintf(fl,"  stop\n") ; 
break; 


The  following  routine  is  used  by  the  above  "code_seg() 


routine  and  emits  code  for  integer  additions. 


code^iadd ( i,  step)  /*  integer  addition  */ 

i  nt  i  ; 

char  *step;  /*  #  of  the  steps  taken  on  int.  code  */ 


char  rl,r2,  /*  register  numbers 

temp_no;  /*  temporary  variable  number 


/*  load  two  operands  */ 

load_t wo_oprnd ( i -1 , &r  1 ,  &r2,  step)  ; 


/*  they  both  might  be  in  the  same  register, 

if  so,  allocate  one  more  register  */ 

if  ( r 1 ==r2 ) 

•C 

get_a_reg (&rl ) ; 

fprintf(fl,"  mov  r  (0:7(d) ,  r  (0:%d)  \n",  r£,  rl )  ; 

> 


/*  since  addition  will  be  loaded  in  rl,  evacuate  it  first  */ 
eva_reg (rl ) ; 


/*  code  for  integer  addition  */ 

fprintf(fl,"  add  r  (0:%d> ,  r  (0:%d) \n",  r2,  rl )  ; 


A 


/*  give  a  number  to  this  temporary  result  */ 

get_a_temp < &temp_no) ; 
occupy _reg (temp_no, rl, TEMP) ; 

/*  set  temporary’ s  address  descriptor  */ 
temp_var Ctemp_no3 =rl+l ; 

/*  validate  emission  array  for  sequential  code  generation  */ 
em i t  st  r  C i D =TEMP ; 
ernitchl  Ci]=*step+1  ; 
emit  st r  C i -1 ] =t  emp_no ; 

> 


Some  sample  C  programs  and  the  code  generated  for  them  by 
the  Tiny-C  compiler  can  be  found  in  Pppendix  F. 


V.  CONCLUSION 


Precise,  underst  andab  1  e  and  enforcable  interface 
standards  can  provide  a  way  to  improve  efforts  toward 
portable  software.  In  the  Tiny-C  implementation  we  showed  a 
way  to  improve  the  programming  capabilities  of  PM,  and  encou¬ 
raged  programmers  to  use  such  a  portable  and  standard x sable 
machine  in  high  level  languages. 

Unfortunately,  this  implementation  is  not  completely 
satisfactory.  Because  of  restricted  capabilities  in  the 
target  PM  machine,  the  Tiny-C  compiler  does  not  fully 
support  application  programming.  Some  of  these  restrictions 
are : 


-  Based  on  the  principle  of  resource  abstraction,  PM  has 

strictly  defined  data  types.  Since  it  presently  does  not 
support  conversion  between  two  types,  it  is  a  higher 
level  concept  than  the  "C"  language.  So,  contrary  to 
usual  implementations,  this  thesis  had  an  opposite 
direction:  production  of  a  lower  level  tool  in  a  higher 

level  environment. 

-  The  PM  abstract  machine  does  not  yet  have  a  complete 

linker.  So,  the  user  is  forced  to  keep  the  whole 

program  and  input/output  library  in  one  single  module, 

which  is  extremely  inconvenient  in  application  environ¬ 
ments. 

-  The  current  version  of  PM  is  an  emulator,  rather  than 
hardware.  Even  though  this  is  convenient  for  a  develop¬ 
ment  phase,  it  is  not  going  to  be  an  easy-to-use  product 
for  users. 

So,  further  development  that  could  be  done  for  an 
improved  PM  environment  might  include: 

-  P  linker  for  PM 


Type  conversion  between  PM  data  types 


-  An  input /out  pat  library  for  the  Tiny— C  compiler 


-  A  Tiny-C  code  generator  for  AM  machine  code  (instead  of 
a  source  generator  for  the  AM  Asembler) 

-  Given  improvements  in  AM,  an  extended  version  of  the 
Tiny-C  compiler  to  cover  the  whole  Tiny-C  language 
grammar 

-  A  compiler  version  of  AM. 


y 
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APPENDIX  ft 


GRAMMAR  FOR  TINY-C  LANGUAGE 


PROGRAM  i 
program: 

<pre-precessor>  *  <data-def init ion) * 
<f anct ion-def init ion> + 


PRE-PRECESSOR  : 


pre-precessor : 

"#define"  <f i le-def init ion>  I 

"#include"  (file-definition) 

f i le-def init ion: 

’ (filename)  ’  "»  I 

’  (’  (f  i  lename)  ’  >  ’ 

f i lename : 

(identifier)  (filetype) 

filetype: 

’  .  ’  (ident if ier) 


DATA  DEFINITIONS  : 

data-def init ion: 

(sc-speci f ier) ?  (decl arat ion) 

sc-speci f ier : 

"auto"  I 

"static"  I 

"extern"  I 

"register" 

dec 1 arat ion : 

(type-speci f ier)  (var iabl e-dec  1 arat ion-1 ist ) 

type-specifier : 

"char"  I 

"short"  I 

"int"  I 

"long"  I 

"unsigned"  I 

"float"  I 

"double" 


variable-declarat  ion-1  ist  : 

(variable— declarat ion)  (more— variable— declarat ions) * 

more-variable-declarat ions : 

(variable-declarat ion) 


DECLARATIONS  ; 
variable-declarat ion : 

"*"?  (identifier)  (index-declaration) ?  ( ini t i al i zer) ? 

index-declarat ion : 

"C"  (constant-expression) ( 1 )  "1" 

initializer: 

"="  (primary) 

primary : 

(identifier)  I 

(constant)  I 

(chai — definition)  I 

(str ing) 

char-def init ion : 

" ’  "  (character) 

str ing : 

(character)*  . 


FUNCTION  DEFINITION  : 
funct ion-def init ion: 

(t  ype-speci f ier) ?  ( funct ion-declarat ion)  ( funct i on- body) 

funct ion-declarat ion: 

(identifier)  "<"  (identifier-list)?  ">" 
ident  i  f  iei — 1  ist  : 

( ident i f ier >  (more- ident i f iers) * 

more— ident i f iers : 

1 , *  (ident i f ier) 

funct ion-body : 

(type-decl-1 ist)  (compound-statement) 


PARAMETER  DECLARATIONS  : 


type-declarat ion-1 i st : 

<  paramet  er-dec 1 arat ion>  + 

paramet er-declarat ions 

<type-specif ier>  < paramet ei — declarat ion-1 ist> 

parameter- declarat ion-1 ist : 

< parameter)  <more-paramet ers> 

more-parameters : 

’ , ’  (parameter) 

parameter : 

’  *’ ?  <identifier>  (index-declaration) ? 


STATEMENTS  : 
statement : 

(compound-statement)  I 

(function-call)  I 

(assignment-statement)  I 

( i f-statement >  I 

(while-statement)  I 

(do-statement)  I 

(foi — statement)  I 

(switch-statement)  I 

(break-statement >  I 

"continue"  I 

(ret  urn-statement >  I 

(goto-statement )  I 

(label)  I 

II  ,  II 

compound-statement : 

"{"  (declaration)*  (statement)  +  ">" 

f unct ion-cal 1 : 

(identifier)  ’(’  (expression-list)  *)’ 
e x press i on- 1 ist : 

(expression)  (more-expressions) * 

more-expressions : 

’ , 1  (expression) 

assignment-statement : 

(assignment)  i 

( incremental-expression) 


ass i gnment : 

< lvalue) 
(lvalue) 
(lvalue) 


"  =  "  (logic-expression) 

(shift -ass i gnment -op)  (shift  _ex press ion) 
(bitwise-assi gnment -op)  (bitwise-expression) 


sh i ft -ass i gnment -op : 

ll^sil  |  II  |  H^=ll  |  II  y  —  II  |  II  II  |  II  y  y  —  II  |  II  ^  ^  _  1 1 

bitwise-assi gnment -op: 

11  &  =  11  |  11  AS  11  |  I*  I  =  II 

i ncrement a 1 -express ion : 

"++"  (lvalue)  I 

" — "  (lvalue)  I 

(lvalue)  "++"  l 

(lvalue) 

i f -st  at  ement : 

"if"  "<"  (logic-expression)  ")"  (statement) 
(else-statement ) ? 

else-st at ement : 

"else"  (statement) 

wh l le-st at ement : 

"while"  "<"  (logic-expression)  ")"  (statement) 
do-statement : 

"do"  (statement)  "while"  ’ (’  (logic-expression)  ’)’ 


for-st at ement : 

"for"  "("  (assignment-list)?  (logic-expression) 

";"  (assignment-list)?  ">"  (statement) 

assi gnment- 1 ist : 

( ass i gnment -st  at ement  >  (more-ass i gnment s) * 

more-ass i gnment  s : 

(assi gnment -statement) 

switch -statement : 

"switch"  "<"  (arithmetic-expression)  ")"  "<" 

(case-stmt) +  ">" 


case-strnt  : 

"case"  I  "default"!  (constant-expression) 
(statement  >  * 

break-statement : 

" break"  1 ; ’ 


ret  urn-st  atement  : 

’Vtturn"  (expression)  ’ 

goto-statenient : 

"goto"  (identifier)  ";" 

label : 

(identifier) 


EXPRESSIONS  : 


expression: 

(string) 

(pointer-express  ion) 
(address-expression) 
(logic-expression) 

( incremental-expression) 


po i nt  er-ex  press i on : 

(array-element ) 
(identifier) 

''  <"  (arith-expr) 


address-ex  press i on : 

"  &"  (array-element ) 
"&"  (identifier) 


1 og i c-ex  press i on : 

(logic-term)  (more- log ic-terms) * 


more- 1 og i c-t  erms : 

"II"  ( logic_term) 


log ic-term : 

( log ic- fact or)  (more- log ic- factors) 


more- log ic-f actors : 

"&&"  (logic-factor) 


log ic-factor : 

*  !’?  (bitwise-expression) 
’!’?  " < "  (logic-expression) 


b i t  w i se_ex  press i on : 

?  (bitwise-term)  (more-bitwise-terms) * 


more-b i t  w i se- 1  erms : 

" I "  (bitwise-term) 


bi twi se-term : 

(bitwise-factor)  (more-bitwise-factors) * 
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more-bit wise-factors: 

<bitwise-f  actor) 

bi t wise- f actor : 

(bitwise-element)  (more-bitwise-elements) * 

more-bitwise-elements : 

* (bitwise-element) 


bitwise-element : 

(compare-exrp)  I 

"("  (bitwise-expression)  ")" 

compare-express i on : 

(compare-term)  (more-compare-terms) * 

more-compare-terms : 

(equality-op)  (compare-term) 

equal ity-op : 

II  s=  II  |  II  |  =  II 

compare-term : 

(compare-factor)  more-compare-factors) * 

more-compare-factors : 

(relat ion-op)  (compare-f actor) 

relat ion-op : 

ii  ^  it  |  ii  y  ii  |  ii  (s'*  |  )as** 

compare-f actor : 

(sh i ft-expression)  I 

" ("  (compare-expression)  ">" 

sh i ft -express i on : 

(lvalue)  (shift-op)  (ar ith-expression)  I 
(ar it h-ex press ion) 

sh i ft -op : 

")>"  I  "  ((" 

ar ith-expression : 

’ -’ ?  (term)  (more-terms)  * 

more- terms : 

(add-op)  (term) 

add-op : 

II  _  M  |  ll  +  II 


term 


<f actor)  <more-f actors) * 


more-factors : 

<mult-op)  <factor) 


mu It -op : 

M  ^  II  |  II  f  II  |  U  %  11 


factor : 

" <"  (ar ith-expr)  ">"  I 
(constant-expression)  I 
(character-def init ion)  I 
(function-call)  I 


(incremental-expression)  I 

(lvalue) 

constant-expression : 

(constant)  I 

(const ant -i dent i f ier) 

lvalue: 

(array-element)  I 

(identifier)  I 

(pointer-expression) 

array-element : 

(identifier)  (index) 

index : 

"C"  (ar ith-expression)  "3" 


SEMANTIC  CONSTRAINTS  : 

(1)  Prohibited  for  extern  and  parameter  declarations, 
for  others. 


Mandatory 


APPENDIX  B 


TINY-C  PARSER  VERSION  1 


extern  char  bufCl,  nextch,  func_erid; 

extern  int  bufp,  glbptr,  line_no; 


program ( )  /*  Tiny-C  Program  */ 

•C 

while  (preprcsO  ) 

1 

while  <data_def < ) ) 

? 

if  < ! f unc_def < ) )  goto  quit; 

while  (! match (EOF) ) 
f unc_end=FALSE ; 
if  ( ! func_def < ) ) 

goto  quit ; 

> 

ret  urn (T RUE )  ; 
quit:  ret  urn (FALSE)  ; 

> 


preprcsO  /*  pre-precessor  */ 

-C 

int  oldp=bufp,  1 inep=l ine_no ; 

g 1 bptr=buf p ; 

if  (matchtoken ( "#def ine  ")) 

< 

if  (!cnst_id())  goto  quit 

5 

if  (! constant () )  goto  quit 

• 

> 

else  if  (matchtoken ( "#incl ude  ")) 

< 

if  ( ! f i le_def ( ) )  goto  quit; 

> 

else  goto  quit ; 
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ret urn (TRUE) ; 

quit:  bufp=oldp;  1 ine_no-l inep;  nextch=buf CbufpD ; 

return (FALSE) ; 

> 


file_def()  /*  file  definition 

< 

int  oldp=bufp,  1 inep=l ine_no; 
char  limiter; 


i  f  (match  (’"’)) 


else  if  (match  (’  <’  )  ) 


else 


1  imiter=’ 


1  imiter=’  <’ 


goto  quit 


if  ( ! f i lename ( ) )  goto  quit; 


if  (1  imiter==’  ) 

< 

if  (  !  match  (’  )  ) 

> 

else  if  (! match (’> 1 ) ) 


goto  quit ; 


goto  quit 


return (TRUE) ; 

quit:  bufp»oldp;  1  ine_no=l  inep;  nextch«=buf  CbufpD  ; 

return (FALSE) ; 

> 


f i lename ( ) 

< 


/*  file  name 


if  (  !  id ( ) ) 


if  ( f i letype ( ) ) 


return (TRUE) ; 


filetypeO  /*  file  type 

■C 

int  oldp=bufp,  1  inep**!  ine_no; 


if  (! match (’.’) )  goto  quit; 


ret  urn (FALSE) ; 


if  (  !  id  ( ) ) 


goto  quit ; 


return  <TRUE) ; 

quit:  bufp=oldp;  1 ine_no=l inep ;  nextch=buf C buf pD ; 

return (FALSE) ; 

> 


data_def()  /*  data  definition  */ 

< 

int  oldp=bufp,  1 i nep= 1 ine_no ; 


g 1 bptr=buf p ; 

i f  <sc_spcfr ( ) ) 

5 

if  (dclrt ion ( ) ) 

ret  urn (TRUE)  ; 


quit:  bufp=oldp;  1 ine_no=l inep ;  nextch=buf C buf pH ; 

ret  urn (FALSE)  ; 

> 


sc_spcfr()  /*  sc  specifier^  */ 

if  (matchtoken < "auto  ")) 

else  if  (matchtoken ( "stat ic  " ) ) 

1 

else  if  (matchtoken ( "extern  ")> 

else  if  (matchtoken ( "register  ")) 

1 

else  ret  urn (FALSE) 

5 

ret  urn (TRUE)  ; 

> 


dclrtion()  /*  declaration  */ 

< 

int  oldp=bufp,  1 inep=l in#_no; 


if  (ftyp_spf())  goto  quit; 

if  ( ! var_d#c_l i st ( ) )  goto  quit; 
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if  (  !  match  (’;’)) 


goto  quit  ; 
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quit : 


return (TRUE) ; 

bufp=oldp;  1 ine_no=l inep; 
return (FALSE) ; 


nextch=buf  Cbuf p3 


typ_spf ( ) 

< 


/*  type  specifier 


if  (rnatchtoken  (  "char  ")) 


else  if  (rnatchtoken < "short  " ) ) 


else  if  (rnatchtoken (“ int  " ) > 


else  if  (rnatchtoken  ("  long  ")) 


else  if  (rnatchtoken ( "unsigned  ">> 


else  if  (rnatchtoken ( "float  ")) 


else  if  (rnatchtoken  ( "double  ">) 


else 


return (FALSE)  » 


return (TRUE) 5 


var_dec_l ist ( ) 
■C 


/*  variable  declaration  list  */ 


if  ( ! vardclr ( ) ) 


return (FALSE) ; 


while  (morevardcls ( ) > 


ret  urn (TRUE)  ; 


morevardcls ( )  /*  more  var 

{ 

int  oldp=bufp,  1 i nep= 1 i ne_no ; 


/*  more  variable  declarations  */ 


if  ( ! match (’,’)) 


goto  quit ; 


if  ( ! vardclr ( ) > 


goto  quit ; 
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quit  : 


ret  urn (TRUE)  ; 

bufp=oldp;  1 ine_no=l inep;  nextch=buf Cbuf pi ; 

return (FALSE) ; 


vardclrO  /*  variable 

■C 

int  oldp=bufp,  1 inep=l ine_no ; 


/*  variable  declaration 


i f  (match  (’■*’)) 


if  ( ! id  ( ) ) 


goto  quit; 


i f ( indxdclr ( ) ) 


if  < init ial izer ( ) ) 


return (TRUE) ; 

quit:  bufp=oldp;  1 ine_no=l inep;  nextch=buf Cbuf pi ; 

ret  urn (FALSE)  ; 

> 


indxdclr()  /*  index  declaration 

< 

int  oldp=bufp,  1 inep=l ine_no; 


if  (  !  match  (’  C’  )  ) 


goto  quit; 


i f  ( cnst  _ex  pr ( ) ) 


if  (  !  match  (’DM) 


goto  quit ; 


quit 


ret  urn (TRUE)  ; 

bufp=oldp;  nextch=buf Ebuf p3 ; 

ret  urn (FALSE)  ; 


1 ine_no=l inep; 


init ial i zer ( )  /*  initializer  */ 

< 

int  oldp=bufp,  1 i nep= 1 i ne_no ; 


if  ( ! match ( ’ =’ > ) 


goto  quit ; 


i  f  (match  (’<’)) 


*.*  v'  v*  s' 


<»S 

m 


if  (! expression () ) 
wh i le ( ! moreexpr ( ) ) 
if  (  !  match  (’>’)) 

> 

else  if  (! expression <) ) 


goto  quit ; 


goto  quit ; 
goto  quit ; 


ret  urn (TRUE)  ; 

quit:  bufp=oldp;  1 ine_no=l inep ;  nextch=buf Cbuf pH ; 

ret  urn (FALSE)  ; 


/*  function  definition  */ 


func_def ()  /* 

-C 

int  oldp=bufp,  1 i nep=l i ne_no ; 


g 1 bptr=buf p ; 


i f  <typ_spf  < ) ) 


if  (  !  func  dclr ( ) ) 


goto  quit ; 


g 1 bptr=buf p ; 
if  < ! f unc_body ( ) ) 
f unc_end=TRUE ; 


goto  quit ; 


il 


j^A 


quit 


return (TRUE) ; 

bufp=oldp;  nextch=buf C buf p] ; 
return (FALSE) ; 


1 ine_no=l inep ; 


func_dclr<)  /* 

< 

int  oldp=bufp,  1 inep=l ine_no; 


/*  function  declaration  */ 


if  (  ! id  < ) ) 


goto  quit ; 


if  <  !  match  (MM)  goto  quit; 


i f  ( idnf rs ( ) ) 


if  (!  match  (MM)  goto  quit; 


quit : 


return (TRUE) ; 

bufp=oldp;  nextch=buf Ebuf pD ; 
return (FALSE) ; 


1 ine_no=l inep ; 


idnfrs ( ) 

•C 


/*  identifiers  */ 


if  (  ! id  () ) 


ret  urn (FALSE) 


while  (more  idO) 


ret  urn (TRUE)  ; 


more_id()  /* 

int  oldp=bufp,  1 inep=l ine_no; 


/*  more  identifiers 


'if  (! match (’,’) )  goto  quit; 


if  ( ! id  ( ) ) 


goto  quit ; 


return (TRUE) ; 

quit:  bufp-oldp;  nextch=buf Cbuf pJ ; 

ret  urn (FALSE)  ; 


1 ine_no=l inep; 


func_body()  /* 

■C 

int  oldp=bufp,  1 inep-1 ine_no; 


/*  function  body 


if  (  ! type_dec_lst ( ) ) 


goto  quit ; 


g 1 bptr«=buf  p ; 


if  ( ! cmpn_stmt ( ) ) 


goto  quit ; 


return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf C buf pD ; 

ret  urn (FALSE)  ; 


1 ine_no=l inep; 


type_dec_lst ( )  /*  type  dec 

<  ' 

int  oldp=bufp,  1 inep=l ine_no; 


if  ( par_dclrt ion ( ) ) 

« 

1 

while  ( par_dclrt ion ( ) ) 


ret  urn (TRUE)  ; 

quit:  bufp^oldp;  1 ine_no=l inep; 

return (FALSE) ; 

> 


/*  type  declaration  list  */ 


nextch=buf Cbuf pi ; 


par_dclrt ion ( )  /*  parameter  declarations  */ 

< 

int  oldp=bufp,  1 inep=l ine_no; 


if  ( ! typ_spf ( ) ) 

if  ( ! par_dec_l ist ( ) ) 

if  ( ! match  ( ’  ;  ’ ) > 


goto  quit ; 
goto  quit ; 
goto  quit; 


return (TRUE) ; 

quit:  bufp=oldp;  1  ine_no=!l  inep;  nextch=buf Cbuf pi ; 

return (FALSE) ; 

> 


par_dec_l ist ( ) 

< 


/*  parameter  declaration  list  */ 


if  (! parameter () )  ret  urn (FALSE ) 
while  (morepardcls ( ) ) 


ret  urn (TRUE)  ; 


morepardcls ( )  /*  more  par 

< 

int  oldp»bufp,  1 inep*l in#_no; 


if  < 1  match (’,’) )  goto  quit; 
if  ( 1  parameter () )  goto  quit; 


/*  more  parameter  declarations  * / 


Win* 


v-  .■»  /-a  .-a.' 


return (TRUE) ; 

quit:  bufp=oldp;  1 ine_no=l inep ;  nextch=buf Cbuf pi ; 

ret  urn (FALSE)  ; 

> 


parameterO  /*  parameter  */ 

< 

int  oldp=bufp,  1 inep=l ine_no; 

i  f  (match  (’*’)) 

5 

if  (!id())  goto  quit; 

i f  ( indxdclr ( ) ) 

1 

return (TRUE) ; 

quit:  bufp=oldp;  1 ine_no=l inep;  nextch=buf Cbuf p3 ; 

ret  urn (FALSE)  ; 

> 


stmtO  /*  statement  */ 

< 

if  (cmpn_strnt  ( )  ) 

else  if  <if_stmt()) 

* 

else  if  (wh i le_stmt  ( ) ) 

5 

else  if  (do_stmt()) 

• 

else  if  (  f  or_strnt  ( )  ) 

5 

else  if  (swtc_stmt ( ) ) 

• 

1 

else  if  ( break_stmt  ( ) ) 

5 

else  if  (matchtoken ( "cont inue  ",1)) 

■C  if  ((  match  (’; 1  )  )  goto  quit; 

> 

else  if  (rtrn_stmt ( ) ) 

5 

else  if  ( goto_stmt ( ) ) 

5 

else  if  ( f unc_cal 1 ( ) ) 

<  if  (!  match  (’;’)  )  goto  quit; 

> 

else  if  (asnmt ( ) ) 

<  if  ( ! match ( ’ ; 1 ) ) 

> 

else  if  (labelO) 


goto  quit  ; 


goto  quit 


else  if  (match  (’;’)) 
else 


quit 

> 


ret  urn (TRUE) 5 
return (FALSE) ; 


cmpn_stmt()  /*  compound  statement  */ 

< 

int  oldp=bufp,  1 inep=l ine_no; 

if  (!  match  (’<’)  )  goto  quit; 

while  (dclrtionO) 

$ 

if  ( Istmt () )  goto  quit; 

wh i le  (stmt () ) 

■ 

1 

if  (! match ( 1 >’) )  goto  quit; 

ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch^buf Ebuf pD ;  1 ine_no=l inep ; 

ret  urn (FALSE)  ; 

> 


func_call()  /*  function  call  */ 

< 


int 

oldp=bufp,  linep=line_ 

_no ; 

if 

(  !  id  ( )  ) 

goto 

quit 

if 

( ! match  ( ’  ( ’  )  ) 

goto 

quit 

if 

(expr_lst  ( )  ) 

5 

if 

(  !  match  (’  )  ’  )  ) 

goto 

quit 

ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf Cbuf p3 ;  1 ine_no=l mep; 

return (FALSE) ; 
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expr_lst<)  /*  expression  list  */ 

< 

if  (! expression () )  ret  urn (FALSE) 

5 

while  (moreexpr ( ) ) 

• 

ret  urn ( TRUE > ; 

> 


moreexpr ()  /*  more  expressions  */ 

< 

int  oldp=bufp,  1 inep=l ine_no; 


if  ( ! match  (’,’)) 

goto 

quit  ; 

if  (! expression () ) 

goto 

quit ; 

ret urn (TRUE) ; 

quit:  bufp=oldp;  nextch=buf C buf p3 ;  1 ine_no=l inep 

ret  urn (FALSE)  ; 

> 


asnmt ( )  /*  assignment  statement  */ 

< 

i f  ( assign  ( ) ) 

else  if  ( incr_stmt  ( ) ) 

5 

e 1 se  ret  urn  <  FALSE ) 

• 

ret  urn (TRUE)  ; 

> 


assignO  /*  simple  assignment  */ 

•C 

int  oldp^bufp,  1 inep* 1 ine_no; 


if  ( ! lvalue!) )  goto  quit; 

i  f  (match  (’=’)) 

•C  if  (  !  1  gc_expr  ( )  )  goto  quit; 


goto  quit; 


else  if  (shf_asm_op ( ) ) 

{  if  ( ! shf _expr ( ) ) 

> 

else  if  ( btw_asm_op  ( ) ) 

<  if  ( ! btw_expr ( ) )  goto  quit; 

> 

else  goto  quit 

return (TRUE) ; 

quit:  bufp^oldp;  nextch=buf Cbuf pi ;  1 ine_no=l inep; 

ret  urn (FALSE)  ; 

> 


shf_asm_op()  /*  shift  assignment  operator  */ 

< 

if  (matchtoken ( "+=  ",0)) 

5 

else  if  (matchtoken ( "-=  ",0)) 

• 

else  if  (matchtoken ("*=  ",0)) 
else  if  (matchtoken (" /=  “,0)) 

a 

J 

else  if  (matchtoken < "X*  ",0)) 

■» 

else  if  (matchtoken (">> =  ",0)) 

1 

else  if  (matchtoken ("< <=  ",0)) 

1 

else  ret urn (FALSE) 


ret urn (TRUE) ; 

> 


btw_asm_op()  /*  bitwise  assignment  operator  */ 

< 

if  (matchtoken ( " ",0)) 

a 

1 

else  if  (matchtoken ( " ,0)) 

* 

else  if  (matchtoken (" I =  ",0)) 

a 

else  return (FALSE) 


ret  urn (TRUE)  ; 

> 


v  n 


incr_stmt(>  /*  incremental  statement*/ 

< 


int  oldp=bufp,  1 i nep= 1 ine_no ; 

char  pre_op=TRUE ;  /*  pre— operator  */ 


if  (raatchtoken("++  ",0)) 

a 

else  if  (matchtoken ( " —  ",0)) 

a 

* 

else  pre_op=FOLSE 

5 

if  ( ! lvalue  < ) )  goto  quit; 


if  ( ! pre_op) 

if  (matchtoken < "++  ",0>) 

5 

else  if  (matchtoken < " —  ",0)) 

a 

1 

else  goto  quit ; 

> 


ret  urn (TRUE)  ; 

quit:  bufp=oldp$  nextch=buf Cbufp3 ; 

return (FOLSE) ; 

> 


1 i ne_no= 1 i nep ; 


if_stmt()  /*  if  statement  */ 

< 


if 

( ! matchtoken ("if 

",  1 )  )  goto 

quit 

if 

( ! match ( ’ ( 1 ) ) 

goto 

quit 

if 

( ! 1 gc_expr ( ) ) 

goto 

quit 

if 

( ! match (’)’)) 

goto 

quit 

if 

( ! stmt  ( ) ) 

goto 

quit 

if 

(else  stmt ( ) ) 

a 

1 

return (TRUE) ; 
uit:  return (FOLSE) ; 
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J  rfV  ITU  J 


else_stmt ( ) 

< 


/*  else  statement 


*/ 

goto  quit ; 
goto  quit; 


while_stmt()  /*  while  statement  */ 

< 


if 

<! matchtoken < "whi le  ", 

1 )  ) 

goto 

quit ; 

if 

< ! match ( ’  ( ’  )  ) 

goto 

quit; 

if 

< ! 1 gc_expr ( ) ) 

goto 

quit ; 

if 

<  !  match  (’  )  ’  )  > 

goto 

quit ; 

if 

(  !  stmt  ( )  ) 

goto 

quit ; 

quit : 

> 

ret  urn (TRUE)  ; 
ret urn (FOLSE) ; 

do_stmt<)  /*  do  statement 

< 

*/ 

if 

(! matchtoken ( "do  ",1)) 

goto 

quit  ; 

if 

(  !  stmt  ( )  ) 

goto 

quit  ; 

if 

(! matchtoken ( "whi le  ", 

1 )  ) 

goto 

quit  ; 

if 

( ! match ( ’ ( ’ ) ) 

goto 

quit  ; 

if 

(  lgc_expr  ( )  ) 

goto 

quit ; 

if 

(  !  match  (’)’)) 

goto 

quit  ; 

if 

(  !  match  (’;’)) 

goto 

quit  ; 

quit: 

ret  urn (TRUE)  ; 
return(FOLSE) ; 

> 


I  if  ( ! matchtoken ( "else  ",!)) 

|  i f  ( ! stmt ( ) ) 

I 

I 

t 

return  < TRUE) ; 
quit:  return (FALSE) ; 

l  > 


for_stmt ( ) 

< 


/*  for  statement 


*/ 


if 

( ! matchtoken ( " for  ",1)> 

goto 

quit  ; 

if 

(  ! match  (’  (’  )  ) 

goto 

quit  ; 

if 

(asn_lst ( ) ) 

5 

if 

(  !  match  (’  ;’  )  ) 

goto 

quit  ; 

if 

( ! lgc_expr ( ) ) 

goto 

quit ; 

if 

(  !  match  (’;’)) 

goto 

quit  ; 

if 

(asn_lst  ( ) ) 

5 

if 

(  !  match  (’)’)) 

goto 

quit  ; 

if 

( ! stmt  ( )  ) 

goto 

quit; 

return (TRUE) ; 
quit:  return (FOLSE) ; 

> 


asn_lst ( ) 

< 


if  ( ! asnmt ( ) ) 


/*  assignment  list 

ret  urn ( FOLSE ) 


while  (more_asnmt ( ) ) 
1 


ret  urn (TRUE) ; 


/*  more  assignments 


more_asnmt ( ) 

int  oldp=bufp,  1  inep58!  ine_no; 


if  ( ! match (’,’)) 
if  ( ! asnmt ( ) ) 


goto  quit ; 
goto  quit  ; 


quit  : 

> 


return (TRUE) ; 

bufp^oldp;  nextch=buf Cbuf pD ; 

ret  urn (FOLSE)  ; 
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1 i ne_no= 1 i nep ; 


mw'xw 


swtc_stmt ( ) 
< 


/*  switch  statement 


* 


if 

(! matchtoken ( "switch  “ ,1)) 

goto 

quit 

if 

(  !  match  (’  (’  )  ) 

goto 

quit 

if 

( ! art_expr ( ) ) 

goto 

quit 

if 

(  !  match  (’  )  ’  )  ) 

goto 

quit 

if 

(  !  match  (’  <’  )  ) 

goto 

quit 

if 

( ! case_stmt ( ) ) 

goto 

quit 

while  (ease_stmt ( ) ) 

5 

if 

(  !  match  (’  >’  )  ) 

goto 

quit 

a 

• 

ret  urn (TRUE)  ; 
ret urn (FOLSE) ; 

case_stmt()  /*  case  statement  */ 

< 

if  (matchtoken ( "case  ",1>) 

-C  if  (  !  cnst_expr  ( )  )  goto  quit; 

> 

else  if  (matchtoken ( "default  ”,0>) 

5 

else  goto  quit 

5 

if  (! match <’:’) >  goto  quit; 

while  (stmt ( ) ) 

! 

ret  urn (TRUE)  ; 
quit:  return (FOLSE) ; 

> 


break_stmt()  /*  break  statement  */ 

< 

if  (! matchtoken (" break  ",1>>  goto  quit; 

if  (!  match  (’; 1  )  )  goto  quit; 
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ret  urn (TRUE)  ; 
quit:  ret  urn (FALSE)  ; 

> 


rtrn_stmt()  /*  return  statement  */ 

< 


if 

(  !  mat  cht  oken  (  "  ret  urn  ",1)) 

goto 

quit  ; 

if 

(expression ( ) ) 

5 

if 

<  !  match  (’;’)) 

goto 

quit  ; 

return (TRUE) ; 
quit:  ret  urn (FALSE) ; 

> 


goto_strnt()  /*  goto  statement  */ 

< 


if 

(  ! matchtoken ( " goto  ",1)) 

goto  ‘  iuit ; 

if 

(  !  id  ( )  ) 

goto  quit; 

if 

( ! match  ( 1  ;  ’  )  ) 

goto  quit ; 

quit  : 

> 

return (TRUE) ; 
ret  urn (FALSE)  ; 

label  ( ) 
{ 

int 

/* 

oldpssbufp,  1  inep=l  ine_no; 

label 

*/ 

if 

(  !  id  ( )  ) 

got  o 

quit; 

if 

( ! match  ( ’  :  ’  )  ) 

goto 

quit; 

quit: 


return (TRUE)  ; 
bufp=»oldp; 
ret  urn (FALSE) 


nextch=buf CbufpD ; 


1  i ne_n o= 1 i nep ; 


> 


1 


expression ( ) 
{ 


/*  expression 


if  (str ing ( ) ) 

5 

else  if  ( pntr_expr ( ) ) 

• 

else  if  (addr_expr ( ) ) 

else  if  <lgc_expr()) 

? 

else  if  ( incr_stmt ( ) ) 

• 

1 

else  ret  urn (FPLSE) 

1 

return (TRUE) ; 


/*  pointer  expression 


pntr_expr()  /*  pointer 

int  oldp^bufp,  1 inep=l ine_no; 


if  (  !  match  (’*’)) 
if  <array_elm ( ) ) 

s 

else  if  (  i d  ( ) ) 

* 


goto  quit; 


else 

if  (match  (’  (’  )  ) 

< 

if 

( ! art_expr ( ) ) 

goto 

quit 

if 

( ! match  (’)’)) 

goto 

quit 

> 

else 

goto 

quit 

ret  urn (TRUE)  ; 

quit:  bufp^oldp;  nextch=buf C buf p]  ; 

return (FPLSE)  ; 

> 


addr_expr()  /*  address  expression 

< 

int  oldp=bufp,  1 inep=l ine_no; 


M* 


if  (  !  match  <»  4’  )  ) 


goto  quit  ; 


/*  logic  factors 


1 g_f ct  s  ( ) 

■C 

int  oldp=bufp,  1 inep=l ine_no; 


*/ 


if  (  matchtoken  (  “ &&  ",0)>  goto  quit; 
if  ( ! lgc_fct () )  goto  quit; 

ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf Cbuf p3 ;  1 ine_no=l inep ; 

ret  urn (FALSE)  ; 


lgc_fct()  /*  logic  factor  •*/ 

< 

int  oldp=bufp,  1 inep=l ine_no; 

if  (  match  (’  !’  ))  /*  unary  operator  */ 

* 

i f  ( btw_expr ( ) ) 

5 

else  if  (match  (’  (’  )  ) 

if  ( ! 1 gc_expr ( ) )  goto  quit; 

if  (!  match  (’)’)  )  goto  quit; 

> 

else  goto  quit 

5 


return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf pj ; 

ret  urn (FALSE)  ; 

> 


1 i ne_no= 1 l nep ; 


btw  expr ()  /*  bitwise  expression  */ 

■C 

int  oldp=bufp,  1 inep=l ine_no; 

i f  (match ( * ) ) 

5 

if  (!btw_trm<))  goto  quit; 

while  (bt_trms()) 

5 

return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf pi  ;  1 ine_no=l inep; 

return (FALSE)  ; 


> 


/*  bitwise  terms 


bt_trms  ( ) 

•C 

int  oldp=bufp, 


1  inep=l  ine_no; 


if  ( ! match  ( ’  1  ’  )  ) 

goto 

quit 

if  ( ! btw_trm ( ) ) 

goto 

quit 

*/ 


ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf CbufpH ; 

return (FOLSE) ; 

> 


1 ine_no=l inep ; 


btw_trm<)  /*  bitwise  term  */ 

■C 

if  (!btw_fct())  return (FALSE) 

* 

1 

while  (bt_fcts()) 

■ 

ret  urn (TRUE)  ; 

> 


bt_fcts()  /*  bitwise  factors  */ 

< 

int  oldp=bufp,  1 inep=l ine_no ; 


if  (!  match  (’'M  )  goto  quit; 

if  < ! bt w_f ct ( ) )  goto  quit; 

ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf Cbuf pi ;  1 i ne_noa 1 i nep ; 

ret  urn (FOLSE)  ; 

> 


btw_fct()  /*  bitwise  factor  */ 

if  ( ! bt w_elm ( ) )  ret  urn (FOLSE) 

1 

while  (bt_elms<)) 

? 

ret  urn (TRUE)  ; 

> 
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/*  bitwise  elements 


bt_elms ( ) 

-C 

int  oldp=bufp,  1 inep=l ine_no; 


if  <  !  match  (’&’)) 


goto  quit ; 


if  (  !  bt w  e  1  m  ( )  ) 


goto  quit; 


quit  : 


return (TRUE) ; 

bufp=oldp;  nextch=buf Cbuf p3 ; 

ret  urn (FALSE)  ; 


1 ine_no=l inep ; 


btw_elm()  /*  bitwise  element 

•C 

int  oldp=bufp,  1 inep=l ine_no; 


i f  (cmp_expr ( ) ) 


else  if  (match  (’  (’  )  ) 
■C 

if  ( ! btw_expr ( ) ) 


goto  quit; 


if  (  !  match  (’  )  ’  )  ) 

> 

else 


goto  quit ; 


goto  quit 


ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf C buf pD ; 

ret  urn (FALSE)  ; 

> 


1 i ne_no= 1 i nep ; 


cmp_expr()  /*  compound 

■C 

int  oldp=bufp,  1 inep=l ine_no; 


/*  compound  expression  */ 


if  (  ! cmp_trm ( ) ) 


return (FALSE) 


while  <cp_trms()) 


ret  urn (TRUE)  ; 


/*  compound  terms 


cp_trms ( ) 

< 

int  oldp=bufp,  1 inep-1 ine_no; 


if  ( ! equ_op ( ) ) 
if  ( ! cmp_trm ( ) ) 


goto  quit ; 
goto  quit ; 


return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf pD ; 

return (FALSE) ; 

> 


1 i ne_no= 1 i nep ; 


equ_op ( ) 

< 


/*  equality  operators 


if  (matchtoken ( "==  ",B)) 

5 

else  if  (matchtoken  ("!  =  ‘',0)> 


else 


ret  urn ( FALSE ) 


ret  urn (TRUE)  ; 


cmp_trm ( ) 
•C 


/*  compound  term 


if  ( ! cmp_f ct ( ) ) 

» 

while  (cp_fcts()) 


ret  urn (TRUE)  ; 


ret  urn (FALSE) 


cp_fcts ( ) 
{ 


/*  compound  factors 


if  (  ! rel_op ( ) ) 
if  ( ! cmp_fct ( ) ) 


goto  quit; 
goto  quit ; 


quit 

> 


ret  urn (TRUE)  ; 
ret  urn (FALSE)  ; 


•V  *  v'  l  ^  ) 


rel_op()  /*  relational  operator  */ 

■C 

i f  (match  ( ’  <’  )  ) 

<  if  (match  (’=’)); 

> 

else  if  (matchO’  )) 

{  if  (match  (*  =’  )  )  ; 

> 

else  ret  urn (FALSE) 

1 

return (TRUE) ; 

> 


cmp_fct()  /*  compound  factor  */ 

< 

int  oldp=bufp,  1 inep=l ine_no ; 


i f  (shf_expr ( ) ) 

■ 

3 

else  if  ( mat ch ( 5  ( ’  ) ) 

{ 

if  ( ! cmp_expr ( ) ) 
if  (  !  match  (’  )  *  )  ) 

> 

else 

3 

return (TRUE) ; 

quit:  bufp=oldp;  nextch 

return (FALSE) ; 

> 


shf_expr()  /*  shift  expression  */ 

< 

int  oldp^bufp,  1 inep= 1 ine_no ; 
if  (shf_init ( ) ) 

3 

if  ( ! art_expr ( ) )  goto  quit 

a 

3 

return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf p3 ;  1 ine_no=l inep 

return (FALSE) ; 


goto 

goto 

goto 

=buf Cbuf pD ; 


quit  ; 
quit ; 
quit 

1 ine_no=l inep 


> 


/*  shift  expression-initial 


shf_init  ( ) 

< 

int  oldp=bufp,  1 inep=l ine_no; 


*/ 


if  (  !  lvalueO  >  goto  quit; 

if  (!shf_op<>>  goto  quit; 

ret  urn (TRUE)  ; 

quit:  bufp=oldp;  nextch=buf Lbuf p3 ;  1 ine_no=l inep ; 

return (FALSE) ; 


shf_op()  /*  shift  operator  •*/ 

< 

if  (rnatchtoken ("> >  ",0)) 

a 

1 

else  if  (rnatchtoken  ("(  (  ",fl)) 
else  ret  urn (FALSE) 


'  ret  urn (TRUE)  ; 

> 


art _expr ( )  /*  arithmetic  expression  */ 

< 

int  oldp=bufp,  linep=line_no; 

if  (match  (’  -M  )  /*  unary  operator  */ 

5 

if  (ItermO)  goto  quit; 

while  (more_term ( ) ) 

5 


return (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf pD  ; 

return (FALSE) ; 

> 


1 ine_no=l inep; 


more_term()  /*  more  terms  */ 

< 

if  (!add_op())  goto  quit; 

if  (ItermO)  goto  quit; 


85 


ret  urn (TRUE) ; 
quit:  return (FALSE) ; 

> 


*dd_op()  /*  additional  operator  */ 

■C 

i  f  (match  (’+’)) 

« 

else  if  (match  (’-’)) 

» 

else  ret  urn (FALSE) 

1 

ret  urn (TRUE)  ; 

> 


term  ( ) 

< 

if  ( ! factor  ( ) ) 

* 

while  (more_fcts ( ) ) 
* 

ret  urn (TRUE)  ; 

> 


more_fcts()  /*  more  factors  */ 

int  oldp=bufp,  1 inep=l ine_no; 


if  ( ! mul_op ( ) )  goto  quit; 

if  (! factor ())  goto  quit; 


ret  urn (TRUE) ; 

quit:  bufp=oldp;  nextch=buf CbufpD ;  1 ine_no=l inep 

return (FALSE) ; 

> 


mul_op()  /*  mult ipl icat ional  operator  */ 

i  f  (match  (’*’)) 

else  if  (match(’/’)) 

* 


/*  term  */ 

ret  urn ( F ALSE ) 
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else  if  (match  (’*’)) 

5 

else  ret urn (FALSE) 

■ 

i 

ret  urn (TRUE)  ; 

> 


factor ()  /*  factor  */ 

< 

int  oldp=bufp,  1 inep=l ine_no; 


i f  (match  ( ’  ( ’  )  ) 
< 


if 

(  ! 

art_expr ( ) ) 

goto 

quit 

if 

(  ! 

match  (’)’)) 

goto 

quit 

> 

else 

if 

( f unc_cal  1  ( )  ) 

1 

else 

if 

( cnst _expr ( ) ) 

5 

else 

if 

(char_def ( ) ) 

5 

else 

if 

( incr_stmt ( ) ) 

5 

else 

if 

( 1 val ue ( ) ) 

5 

else 

goto 

quit 

ret  urn (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf pD ;  1 ine_no=l inep; 

return (FOLSE) ; 

> 


cnst_expr()  /*  constant  expression  */ 

•C 

i f  (  const  ant ( ) ) 

5 

else  if  (!cnst_id()) 

ret  urn (FOLSE) ; 

ret  urn (TRUE) ; 

> 
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lvalueO  /*  left  value  */ 

< 

int  oldp*bufp,  1 inep=l ine_no; 

char  prnthsis»FALSE; 


if  (match  ('  (’))  prnthsis=TRUE; 

if  (array_elm ( ) ) 

else  if  ( id ( ) ) 

5 

else  if  ( pntr_expr ( ) ) 

• 

else  goto  quit 

5 

if  (prnthsis) 

if  (! match ( 1  )*) )  goto  quit; 

ret  urn (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf p3 ;  1 ine_no=l inep 

ret  urn (FOLSE) ; 

> 


array_elm()  /*  array  element  */ 

< 


int 

oldp=buf p, 

1 inep=l ine_no; 

if 

( ! id  ( )  ) 

goto 

quit; 

if 

(  ! index ( ) ) 

goto 

quit  ; 

ret urn (TRUE) ; 

quit:  bufp=oldp;  nextch=buf Cbuf p3 ;  1 ine_no=l inep 

ret  urn (FALSE)  ; 

> 


index ()  /*  index  expression  for  arrays  */ 

■C 

int  oldpsbufp,  1 inep=l ine_no; 


if  (!  match  (’ C’ ) )  goto  quit; 

i  f  (art_expr ( ) ) 

* 

if  ( ! match ( ’  3  ’  ) )  goto  quit; 


aa 


'  i  Mh*  If  4»-  .  ■/  ...  W  ‘ 


nextch=buf Cbuf p3 ; 


quit  : 

> 


ret  urn (TRUE)  ; 
buf p=oldp ; 
return (FALSE) ; 


1  i ne_no= 1 i nep ; 


primaryO  /*  primary  expression  */ 

< 

i f  ( cnst  _ex  pr ( ) ) 

5 

else  if  (array_elrn  ( )  ) 

5 

e 1 se  if  ( i d  ( ) ) 

5 

else  if  (char_def () ) 

5 

else  if  (stringO) 

a 

1 

else  Y'et  urn  (FALSE) 

a 

J 

return (TRUE)  ; 

> 
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APPENDIX  C 


TERMINALS  AND  BASIC  NONTERMINALS 


isaltr(c)  /*  is  a  letter?  */ 

int  c;  /*  character  to  test  */ 

< 

return  (<  c>  =’  A’  &&  c<  =  ’ Z’  )  II  <  c>=’a’  &&  c<=’z»  )); 

> 


iscapch (c) 
int  c; 

< 

return  (  c>=*A’  &&  c<=’ Z’  )$ 

> 


/*  is  a  capital  letter?  */ 

/*  character  to  test  */ 


V\ 


/*  is  a  digit?  */ 

/*  character  to  test  */ 


isadgt (c) 
int  c ; 

{ 

return  (  c>=’0’  &&  c  <=' 9’  ); 

> 


isidch(c)  /*  is  identifier  character?  */ 

int  c ; 

< 

return!  isaltr(c)  II  isadgt (c)  II  c==’  ’  ); 

> 


delimiter!) 

< 

return ( 


/*  is  next-character  a  delimiter? 


*/ 


next  ch== 

i  -i 

1 1 

nextch== 

1  <’ 

1  1 

nextch==  ’ 

>  ’ 

nextch== 

»  +» 

i  i 

next  ch== 

9  —  1 

1 1 

nextch==  ’ 

1  ’ 

nextch== 

»  *’ 

i  i 

nextch== 

1  /’ 

1 1 

nexteh==  ’ 

&’ 

nextch« 

i  .  i 

1 1 

nextch== 

9  1 

* 

1  1 

next  ch==  ’ 

■  9 

nextch== 

’  )  ’ 

i  i 

nextch« 

1  -M 

1 1 

nextch==  ’ 

1  9 

next  ch== 

’  x» 

i  i 

nextch®* 

1  <’ 

1  1 

nextch==  ’ 

)  ’ 

nextch== 

’  c» 

i  i 

next  ch== 

’  1  ’ 

1  1 

nextch==’  . 

9 

whtchr (next ch ) 

) ; 

/*  is  a  white-character? 
/*  character  to  test 


whtchr (c) 
int  c  5 

return  (  c==’  ’  I  I  c==TAB  I  I  c==CR  I  I  e»«LF  )  ; 

> 


‘t 


char_def ( ) 
< 


/*  is  character  definition?  */ 


char  blank=FALSE; 
delwht ( ) ; 


/*  boolean  var.  for  blanks  */ 


/*  skip  white  characters  */ 


/*  character  definition  should  start  with  character  ■*/ 

if  (! match <’ \’ ’) )  return (FALSE) 


/*  consume  the  following  white  characters,  if  there  is  any  */ 

while  <  whtchr (nextch)  ) 

< 

nextch  =  getchrO; 
blank  =  TRUE; 

> 


if  (  nextch==’  V  ’  ) 

< 

/*  check  if  character  body  is  empty  */ 

i f  (  ! blank  ) 

/*  illegal  character  definition  */  err_rnsg  ( ICDF) 


/*  else  it  is  a  blank  character  */ 
else 
< 

nextch=getchr ( ) ; 
return (TRUE) ; 

> 

> 


/*  if  met  ’ \*  character,  parse  one  more  */ 
if  (  nextch==’ \\’  )  nextch=getchr ( ) 


/■*  parse  the  original  character  */ 
nextch=getchr ( ) ; 


/*  should  finish  with  character'  */ 

if  (  !  match  (’  \  ”  >  > 

/*  illegal  character  definition  */  err_msg ( I CDF ) 


ret  urn (TRUE )  ; 


str ing ( ) 
< 


/*  is  string?  */ 


char  i;  /*  index  variable  */ 

delwhtO;  /*  skip  white  characters  */ 


/*  string  must  start  with  ’  character  ■*/ 
if  (!  match  (’"’)  )  return  (FALSE) 

5 


/*  since  strings  not  implemented  in  Tiny— C,  just  consume  it  */ 

for  <i=0;  (  nextch ! ='  )  &&  <  i (=MXSTR  );  ++i  ) 

nextch=getchr () ; 


/*  check  if  it  is  too  long  */ 
if  (  i  >  MXSTR  ) 

/*  string  length  too  long  */  err_msg (SLTL) 
5 

/*  should  finish  with  character  */ 

match  (’"’); 

ret  urn (TRUE)  5 

> 


constant ()  /*  integer  constant  */ 

char  i=<9;  /*  index  variable  */ 

delwhtO;  /*  skip  white  characters  */ 

/*  it  should  start  with  a  digit  */ 

if  <! isadgt (nextch) )  ret  urn (FALSE) 

5 

while  ( isadgt (nextch ) )  /*  parse  the  number  */ 

■C 

/*  check  if  number  length  is  too  long  */ 
if  ( i >  =MXNML) 

/*  number  length  too  long  */  err_msg (NLTL) 

else  num  nameCi++]=nextch 


9£ 


a® 


> 


nextch=getchr  ( ) ; 


if  (nextch!=’  ’  &&  IdelimiterO  ) 

/*  delimiter  was  expected  */  err_msg (DWEX ) 

num_name C i 3 =’  ’ ; 

/*•  convert  string  "nurn_narne"  into  numeric  value  */ 
str_num ( ) ; 

/*  add  number  into  constant  table  */ 
add_num ( ) ; 

ret  urn (TRUE)  ; 


> 


cnst_id()  /*  constant  identifier  */ 

■C 

int  oldp,  linep; 

char  i=0;  /*  index  variable  */ 

delwhtO;  /*  skip  white  characters  */ 

oldp=bufp;  1 inep=l ine_no; 
nextch=buf Cbufpl ; 


/*  first  character  should  be  a  capital  letter  */ 
if  < ! iscapch (nextch ) )  ret  urn (FRLSE) 


while  ( iscapch (nextch) ) 

< 

/*  check  if  identifier  length  is  too  long  */ 
if  ( i >  =MX IDL) 

/*  identifier  length  too  long  */  warning ( ILTL) 

else  id_nameC i++3=nextch 

5 

nextch=getchr ( ) ; 


/*  if  following  character  is  still  a  letter,  it  can  be  a  lower 
letter  only,  since  Tiny-C  assumes  constant  identifiers  are 
all  capital  letters,  this  cannot  be  a  constant  identifier  */ 


if  ( isaltr (nextch ) )  goto  quit 

? 

if  (nextch  !=’  ’  &&  '.delimiter  ()  ) 

/*  delimiter  was  expected  */  err_msg (DWEX ) 

5 

id_narne  C  i  3  =  ’  ’  ; 

ret  urn (TRUE)  ; 

/*  backtrack  on  the  scanner  buffer,  and  return  FALSE  */ 

quit:  bufp=oldp;  1 ine_no=l inep ;  nextch=buf Cbuf p3 ; 

return (FALSE) ; 

> 

id()  /*  is  identifier?  */ 

{ 

char  i=0;  /*  index  variable  */ 

delwhtO;  /*  skip  white  characters  */ 

/*  should  start  with  a  letter  */ 

if  (! isaltr (nextch ) )  return (FALSE) 

5 

while  ( isidch (nextch) ) 

< 

/*  check  if  identifier  length  is  too  long  *■/ 
if  ( i >  =MX IDL) 

/*  identifier  length  too  long  */  warning ( ILTL ) 

5 

else  id_narne  C  i+-*-3  =nextch 

i 

nextch=getchr ( )  ; 

> 

/*  following  character  must  be  a  delimiter!  */ 

if  (nextch  !  =’  ’  &&  !  del  inciter  ( )  ) 

/*  delimiter  was  expected  */  err_msg < DWEX ) 

5 

i d_name  C  i  3  =  ’  ’  ; 

/*  if  identifier  is  a  reserved  word,  give  error  message  */ 
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if  ( ! rsvr_test  ( ) ) 

/*  reserved  word  not  expected  */  err_rnsg  (RVNE) 
* 

ret urn ( TRUE ) ; 

> 
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APPENDIX  D 


TINY-C  COMPILER' ERROR  AND  WARNING  MESSAGES 


Error  Messages; 


#def ine 

AVNI 

1 

/* 

auto  variables  not  implemented 

*/ 

♦♦define 

SVNI 

£ 

/* 

static  variables  not  implemented 

*/ 

#def ine 

EVNI 

3 

/* 

external  variables  not  implemented 

*/ 

♦♦define 

RVNI 

4 

/* 

register  variables  not  implemented 

*/ 

#def ine 

SMEX 

5 

/* 

semicolon  was  expected 

*/ 

♦♦define 

LINI 

6 

/* 

long  integers  not  implemented 

*/ 

#def ine 

UINI 

7 

/* 

unsigned  integers  not  implemented 

*/ 

#def ine 

FPNI 

a 

/* 

floating  points  not  implemented 

*/ 

#def ine 

DPNI 

9 

/* 

double  precisions  not  implemented 

*/ 

#def ine 

IDEX 

10 

/* 

identifier  was  expected 

*/ 

#def ine 

IBSB 

n 

/* 

index  body  was  supposed  to  be  blank 

*/ 

#def ine 

RSBE 

12 

/* 

right  square  bracket  was  expected 

*/ 

#def ine 

CINI 

14 

/* 

compound  initializers  not  implemented*/ 

#def ine 

EXEX 

15 

/* 

expression  was  expected 

*/ 

#def ine 

LCBE 

16 

/* 

left  curly  bracket  was  expected 

*/ 

#def ine 

LPEX 

17 

/* 

left  parenthesis  was  expected 

*/ 

(♦define 

RPEX 

18 

/* 

right  parenthesis  was  expected 

*/ 

♦♦define 

IE  AC 

19 

/* 

identifier  was  expected  after  comma 

*/ 

#def ine 

EEAC 

£0 

/* 

expression  was  expected  after  comma 

*/ 

#def ine 

PTNI 

£1 

/* 

pointers  not  implemented 

*/ 

#def ine 

ARNI 

2£ 

/* 

arrays  not  implemented 

*/ 

♦♦define 

IPN I 

55 

/* 

include  preprecsr  not  implemented 

*/ 

#def ine 

FTEX 

£3 

/* 

filetype  was  expected 

*/ 

♦♦define 

IVFD 

£4 

/* 

invalid  file  definition 

*/ 

♦♦define 

RCBE 

25 

/* 

right  curly  bracket  was  expected 

*/ 

♦♦define 

PREX 

£6 

/* 

parameter  was  expected 

*/ 

♦♦define 

PREC 

£7 

/* 

parameter  expected  after  comma 

*/ 

♦♦define 

AEAC 

£8 

/* 

an  assignment  expected  after  comma 

*/ 

#def ine 

LPEI 

£9 

/* 

left  parenthesis  expected  after  if 

*/ 

♦♦define 

LPEW 

30 

/* 

left  prnthsis.  expected  after  while 

*/ 

♦♦define 

LPEF 

31 

/* 

left  parenthesis  expected  after  for 

*/ 

♦♦define 

LPES 

41 

/* 

left  prnthesis.  expected  after  switch*/ 

♦♦define 

ILEI 

3£ 

/* 

illegal  logic  expression  in  if 

*/ 

♦♦define 

I  LEW 

33 

/* 

illegal  logic  expression  in  while 

*/ 

#def ine 

ILEF 

34 

/* 

illegal  logic  expression  in  for 

*/ 

♦♦define 

WMFD 

35 

/* 

while  is  missing  from  do 

*/ 

♦♦define 

SSFI 

38 

/* 

a  statement  should  follow  after  if 

*/ 

♦♦define 

SSFE 

37 

/* 

a  statement  should  follow  after  else 

*/ 

#def ine 

SSFW 

38 

/* 

a  statement  should  follow  after  while*/ 

♦♦define 

SSFF 

39 

/* 

a  statement  should  follow  after  for 

*/ 

#def ine 

SSFD 

54 

/* 

a  statement  should  follow  after  do 

*/ 

♦♦define 

SMIF 

40 

/* 

semicolon  is  missing  in  for 

*/ 

#def ine 

I  AES 

4£ 

/* 

illegal  arith.  expression  in  switch 

*/ 

♦♦define 

CSMS 

43 

/* 

case  statement  is  missing 

*/ 

#def ine 

CLIM 

44 

/* 

colon  is  missing 

*/ 
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#def ine 

ICEC 

45 

/* 

invalid  constant  exprs.  after  case 

*/ 

#def ine 

AENI 

46 

/* 

address  expression  not  implemented 

*/ 

#def ine 

OCNI 

47 

/* 

one's  complement  not  implemented 

*/ 

#def ine 

BON  I 

48 

/* 

bitwise  operators  not  implemented 

*/ 

#def ine 

SEN1 

49 

/* 

shift  expressions  not  implemented 

*/ 

#def ine 

INPE 

50 

/* 

invalid  pointer  expression 

*/ 

#def ine 

INAE 

51 

/* 

invalid  address  expression 

*/ 

#def ine 

UNVR 

5£ 

/* 

unknown  variable 

*/ 

#def ine 

I  PE  I 

53 

/* 

invalid  arith.  expr.  in  array  index 

*/ 

♦♦define 

RVNE 

56 

/* 

reserved  word  not  expected 

*/ 

#def ine 

ILFB 

57 

/* 

illegal  function  body 

*/ 

#def ine 

CBPR 

58 

/* 

input  couldn' t  be  parsed 

*/ 

#def ine 

TBBP 

59 

/* 

too  big  block  to  parse 

*/ 

♦♦define 

UEOF 

60 

/* 

unexpected  end  of  file 

*/ 

#def ine 

CETL 

61 

/* 

comment  endless  or  too  long 

*/ 

♦♦define 

SETL 

62 

/* 

string  is  endless  or  too  long 

*/ 

#def ine 

UMPH 

63 

/* 

unmatched  parenthesis 

*/ 

♦♦define 

SMUP 

64 

/* 

semicolon  missing/unmatched  prnthesis*/ 

#def ine 

CIEX 

65 

/* 

constant  identifier  expected 

*/ 

♦♦define 

CVEX 

66 

/* 

constant  value  expected 

*/ 

♦♦define 

STIF 

67 

/* 

symbol  table  is  full 

*/ 

#def ine 

NLTL 

68 

/* 

numeric  length  too  long 

*/ 

#def ine 

TBNV 

69 

/* 

too  big  numeric  value 

*/ 

♦♦define 

NS  IF 

70 

/* 

name  string  is  full 

*/ 

♦♦define 

DTIF 

71 

/* 

definition  table  is  full 

*/ 

#def ine 

CTIF 

72 

/* 

constant  table  is  full 

*/ 

#def ine 

VS  IF 

73 

/* 

variable  string  is  full 

*/ 

#def ine 

LTIF 

74 

/* 

label  table  is  full 

*/ 

♦♦define 

DC  ID 

75 

/* 

duplicated  cons,  id  declaration 

*/ 

#def ine 

DLDC 

76 

/* 

duplicated  label  declaration 

*/ 

♦♦define 

I  CDF 

77 

/* 

illegal  character  definition 

*/ 

♦♦define 

SLTL 

78 

/* 

string  length  too  long 

*/ 

♦♦define 

DWEX 

79 

/* 

delimiter  was  expected 

*/ 

♦♦define 

UDLB 

80 

/* 

undeclared  label 

*/ 

♦♦define 

DPDC 

81 

/* 

duplicated  parameter  declaration 

*/ 

♦♦define 

DPFA 

82 

/* 

declared  parameter  is  not  a  fun.  arg. 

*/ 

♦♦define 

UNPE 

83 

/* 

undeclared  parameter  exists 

*/ 

♦♦define 

DFDC 

84 

/* 

duplicated  function  declaration 

*/ 

#def ine 

I  CAN 

86 

/* 

inconsistent  argument  number 

*/ 

♦♦define 

DDDC 

87 

/* 

duplicated  default  declaration 

*/ 

♦♦define 

I VBR 

88 

/* 

invalid  break  usage 

*/ 

♦♦define 

TMNL 

89 

/* 

too  many  nested  level 

*/ 

Warning  Messages: 


♦♦define 

AFRI 

1 

/* 

#def ine 

CSIB 

2 

/* 

#def ine 

ILTL 

3 

/* 

♦♦define 

TOMF 

4 

/* 

all  functions  return  integer 
compound  statement  is  blank 
identifier  length  too  long 
main  function  is  missing 


*/ 

*/ 

*/ 

*/ 
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APPENDIX  E 


INTERMEDIATE  CODE  DEFINITIONS  FOR  TINY-C 


V 

^define 

I  ADD 

1 

/* 

integer  addition 

*/ 

•def ine 

I  SUB 

L— 

/* 

integer  subtraction 

*/ 

#def ins 

IMUL 

3 

/* 

integer  multiply 

*/ 

•def ine 

IDIV 

4 

/* 

integer  division 

*/ 

•define 

MDLS 

5 

/* 

integer  modulus 

*/ 

•define 

IMLB 

6 

/* 

label  declaration 

*/ 

■ 

•def ine 

JUMP 

7 

/* 

uncondit ional  jump 

*/ 

•def ine 

JPTR 

8 

/* 

jump  if  true 

*/ 

#def ine 

JPFL 

9 

/* 

jump  if  false 

*/ 

#def ine 

L6EX 

10 

/* 

logic  expression 

*/ 

•def ine 

CONS 

11 

/* 

constant 

*/ 

I* 

#def ine 

ARID 

12 

/* 

array  identifier 

*/ 

’ 

V 

•def ine 

VARB 

13 

/* 

variable 

*/ 

", 

#def ine 

UNMS 

14 

/* 

unary  minus 

*/ 

♦ 

♦  ' 

#def ine 

LGNT 

15 

/* 

unary  logic  not 

*/ 

•define 

EQLT 

18 

/* 

equal ity 

*/ 

Ij 

•define 

NTEQ 

19 

/* 

not  equal 

*/ 

§ 

,K 

•define 

LSTN 

20 

/* 

less  than 

*/ 

#def ine 

GRTN 

21 

/* 

greater  than 

*/ 

#def ine 

LTEQ 

22 

/* 

less  than  or  equal 

*/ 

.» 

;* 

•def ine 

GTEQ 

23 

/* 

greater  than  or  equal*/ 

#def ine 

LGAN 

24 

/* 

logic  and 

*/ 

•def ine 

LGOR 

25 

/* 

logic  or 

*/ 

•def ine 

ASSN 

26 

/* 

assignment 

*/ 

•define 

ADAS 

27 

/* 

add i t i on-ass i gnment 

*/ 

♦♦define 

SBAS 

28 

/* 

subtract  ion-assign 

*/ 

• 

* 

♦♦define 

MLAS 

29 

/* 

mult i ply-assignment 

*/ 

♦♦define 

DVAS 

30 

/* 

d i v i s i on-ass i gnment 

*/ 

, 

#def ine 

MDAS 

31 

/* 

modu 1 us-ass i gnment 

*/ 

/ 

♦♦define 

FNCL 

32 

/* 

function  call 

*/ 

V 

V 

•define 

ARGM 

33 

/* 

argument 

*/ 

'.i 

♦♦define 

EXLB 

34 

/* 

explicit  label 

*/ 

■ 

♦♦define 

GOTO 

35 

/* 

jump  to  exp.  label 

*/ 

•def ine 

CASE 

36 

/* 

case  statement 

*/ 

V* 

♦♦define 

TVAR 

37 

/* 

temporary  variable  • 

*/ 

♦♦define 

SWTC 

38 

/* 

switch  statement 

*/ 

i 

♦♦define 

PNTR 

39 

/* 

pointer 

*/ 

•  * 

•define 

ADDR 

40 

/* 

address 

*/ 

*V 

i 

♦♦define 

RTRN 

41 

/* 

return 

*/ 

♦♦define 

INDX 

42 

/* 

index 

*/ 

* 

* 

#def ine 

FNDC 

45 

/* 

function  declaration 

*/ 

•define 

STMT 

46 

/* 

statement 

*/ 

. 

> 

•define 

DUMV 

47 

/* 

dummy  statement 

*/ 

!• 

•define 

BREK 

48 

/* 

break  statement 

*/ 

» 

! 

•define 

DFLT 

49 

/* 

default  case 

*/ 

•define 

INCR 

50 

/* 

increment 

*/ 

. 

•define 

DCRT 

51 

/* 

decrement 

*/ 

:< 

•define 

INCL 

52 

/* 

increment,  later 

*/ 

♦♦define 

DCRL 

53 

/* 

decrement,  later 

*/ 

#def ine 

NOOP 

54 

/* 

no  operation 

*/ 

#def ine 

TEMP 

55 

/* 

temporary  variable 

*/ 

#def ine 

CNVB 

56 

/* 

convert  to  boolean 

*/ 

#def ine 

BTEM 

57 

/* 

boolean  temporary 

*/ 

♦♦define 

FEND 

58 

/* 

function  end 

*/ 

♦♦define 

MAIN 

59 

/* 

main  function 

*/ 

♦♦define 

MEND 

6l2i 

/* 

end  of  main  function 

*/ 
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APPENDIX  F 


TEST  PROGRAMS  FOR  THE  TINY-C  COMPILER 


Program  1. 

mainO 

< 

irit  .joe,  jimmy? 


joe=5; 
j  i  mmy-.l  5 

switch  ( 
■C 

case  12 
default 
case  14 
> 


joe  *  5  ) 

++joe ; 
break ; 

j  oe=*  j  i  mmy +27  ? 
break  ? 

joe=  jimmy — ? 
break ; 
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codeseg 

equ 

(0:0) 

dataseg 

equ 

( 1  : 0  > 

org 

dataseg 

synil 

ds 

1 

sym2 

ds 

1 

org 

codeseg 

jmp 

main 

main : 

move 

■Cint,5>,r(0:0) 

move 

■Cint,  15>,  r  (0:  1) 

mov 

r  <0:0) , r (0:2) 

mul 

r(0:0),r(0:2) 

move 

r  (0: 0) ,  syrnl 

move 

r (0 : 1 ) , sym2 

move 

r (0:2) , sym3 

jmp 

imlbl0 

i  m  1  b  1 2 : 

move 

syrnl ,  r  (0 :  0) 

add 

•Cint,  1 >,  r (0 :0) , r (0 :  1 ) 

move 

r (0: 1 )  ,  syrnl 

jmp 

iml bl 1 

i  m  1  b  1 3 : 

move 

{  int,  27>,  r  < 0 : 0 ) 

move 

sym2,  r (0 : 1 ) 

add 

r(0:l),r<0:0) 

move 

r  (0 : 0)  ,  syrnl 

jmp 

iml bl 1 

i  m  1  b  1 4 : 

move 

sym2,  r (0 : 0) 

sub 

•Cint,  l>,r(0:0),r(0:l) 

move 

r (0: 0) ,  syrnl 

move 

r  <0 : 1 ) , sym2 

jmp 

iml bl 1 

imlbl0: 

move 

sym3,  r  <0 : 0) 

move 

•Cint,  12>,  r  (0:  1) 

if 

r  (0:0)  =~r  (0:1),  implblc’ 

move 

Cint,  1 4  >,  r  (0:2) 

if 

r (0: 0) ==r (0:2) , impl bl4 

jmp 

implbl3 

imlbl 1 : 

stop 

Program  £. 

main  < ) 

< 

int  joe,  jimmy; 
joe=*3 ; 

j i mmy= j  oe*37 ; 

if  (  joe  >  37  &&  jimmy<=joe  &&  jimmy 

< 

j immy=18 ; 

> 

else 

j immy=£7 ; 
joe=j immy+25 ; 

> 
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a 


•a 


ie  Code  for  the  Proc 


codeseg 

dataseg 


syml 

sym£ 


blbl0: 


blbl 1 : 


blbl£i 


blb!3i 


bl bl4 : 


blbl5: 


iml bl0 : 


move 


iml bl 1 : 


add 

stop 


(0:0) 

( 1  : 0) 
dataseg 
1 
1 


codeseg 


■Cint,  3>,  r  (0:0) 

•Cint,  37>,  r  (0:  1 ) 
r (0:0), r (0:1) 

<  int ,  37 >,  r  (0 : 2) 

r (0:0) >  r (0:2) , blbl0 

<  bool ,  f  alse>,  r  (0  :  3) 
blbl  1 


■C bool ,  true>,r(0:3) 


r (0: 1)  <=r (0:0) , blbl£ 
•Cbool,  false >,  r  (0:4) 
bl  bl3 


-Cbool,  true),  r  (0:4) 


r (0:3)  ,  r (0:4) 
r  (0: 1 )  ==<  int,0>,  blbl4 
-Cbool,  true),  r  (0:5) 
bl  b!5 


-Cbool,  false)-,  r  (0:5) 


r (0:4) ,  r (0:5) 
r (0:0) , syml 
r (0: 1 ) , sym£ 
r (0:5) =-< bool , fall 
<int,  lfl>,r(0:0) 
r (0:0) , sym2 
iml bl 1 


* >,  iml bl0 : 


■Cint,  £7>,  r  (0:0) 
r (0 : 0) , sym£ 


< int, £5>, r (0:0) 
sym2, r (0 : 1 ) 
r(0:l),r(0:0) 


VsTj 


Program  3. 

f  unct  ion  (joe,  .jimmy) 
int  .joe,  jimmy; 


do 

.joe  =  jimmy++; 
while  (  joe  ==>  3)  ; 

— j i mmy ; 

> 


main  ( ) 

< 

int  joe,  jimmy; 


j immy=5 ; 
do 

joe  =  jimmy — ; 
while  (  joe  ==*  3 )  ; 

f unct ion (  j immy,  joe)  ; 

++ j immy ; 

> 
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The  Code  for  the  Program  3 


codeseg 

equ 

(0:0) 

dataseg 

equ 

(1:0) 

org 

dataseg 

syml 

ds 

1 

sym£ 

ds 

1 

sym4 

ds 

1 

sym5 

■ 

ds 

1 

■ 

org 

codeseg 

jmp 

main 

f unct ion : 

pop 

s (0)  ,  r  <0:0) 

pop 

s  (0)  ,  r  (0 :  1 ) 

rnove 

r (0: 0) ,  syml 

move 

r ( 0 : 1 ) ,  sym2 

imlbl0: 

move 

sym2,  r (0: 0) 

add 

•Cint,  l>,r<0:0),r(0:l) 

move 

■Cint,3>,r(0:2) 

if 

r ( 0 : 0 ) ==r (0:2), b 1 b  1 0 

move 

-Cbool,  false),  r  (0s  3) 

jump 

blbl  1 

blbl0s 

move 

-Cbool,  true),  r  (0:3) 

blbl 1 : 

move 

r (0:0) , syml 

move 

r (0: 1 ) , sym2 

if 

r  (0:3)  ==-C bool,  true),  imlbl0: 

*  m  1  b  1 1 : 

move 

sym2,  r (0: 0) 

sub 

•Cint,  l>,r(0:0),r(0:l) 

push 

•C  int,  1  >,  s  (0) 

rts 

s  ( 1 ) 

ma  l  n : 

move 

<  int ,  5>,  r  ( 0 :  0) 

move 

r (0:0) , sym5 

move 

r (0: 1 ) , sym2 

i  m  1  b  1 2 : 

move 

sym5, r (0:0) 

sub 

<int,  l>,r(0:0),r(0:l) 

move 

<  int,  3>,  r  (0:2) 

if 

r (0:0) =*=r (0:2)  ,  blbl2 

move 

•C  bool ,  f  alse>,  r  (0  :  3) 

j  ump 

bl  bl3 

blb!2: 

move 

■Cbool,  true),  r  (0:  3) 
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v  «*;  - r  r.y  v 


move  r<0:0),sym4 

move  r ( 0 : 1 ) , symS 

if  r (0: 3) =={ bool ,  true),  irnl bl 

imlbl3: 

move  syrn4,r(0:0) 

push  r(0:0),s(0) 

move  sym5, r(0:l) 

push  r  <0:  1 )  ,  s  (0) 

jsr  f uncal 1 , s ( 1 ) 

pop  s(0> , r (0:£) 

add  < int,  1 >,  r (0: 1 ) , r (0:3) 
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